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Abstract 

This paper proposes a low-complexity algorithm for blind equalization of data in OFDM-based wireless systems 
with general constellation. The proposed algorithm is able to recover data even when the channel changes on a 
symbol-by-symbol basis, making it suitable for fast fading channels. The proposed algorithm does not require any 
statistical information of the channel and thus does not suffer from latency normally associated with blind methods. 
We also demonstrate how to reduce the complexity of the algorithm, which becomes especially low at high SNR. 
Specifically, we show that in the high SNR regime, the number of operations is of the order 0{LN), where L is the 
cyclic prefix length and N is the total number of subcarriers. Simulation results confirm the favorable performance 
of our algorithm. 
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I. Introduction 

Modem wireless communication systems are expected to meet an ever increasing demand for high data rates. A 
major hindrance for such high data rate systems is multipath fading. Orthogonal frequency division multiplexing 
(OFDM), owing to its robustness to multipath fading, has been incorporated in many existing standards (e.g., IEEE 
802.11, IEEE 802.16, DAB, DVB, HyperLAN, ADSL etc.) and is also a candidate for future wireless standards 
(e.g., IEEE 802.20). All current standards use pilot symbols to obtain channel state information (needed to perform 
coherent data detection). This reduces the bandwidth available for data transmission, e.g., the IEEE 802. lln standard 
uses 4 subcarriers for pilots, that is 7.1% of the available bandwidth, of the 56 subcarriers available for transmission. 
Blind equalization methods are advantageous as they do not require regular training/pilots symbols, thus freeing up 
valuable bandwidth. 

Several works exist in literature on blind channel estimation and equalization. A brief classification of these 
works based on a few commonly used constraints/assumptions is given in Table U (note that this list is not 
exhaustive). Broadly speaking, the literature on blind channel estimation can be classified into maximum-likelihood 
(ML) methods and non-ML methods. 

The non-ML methods include approaches based on subspace techniques ifTl- lfTOl . second-order statistics ifTTI . 
ifTll . |fT3l . Cholesky factorization llT4l . iterative methods ifHl . virtual carriers |16| real signal characteristics ifTTl 
and linear precoding |[T2l . ifTSl . Subspace-based methods Hl-llll, f/l- lfTOl generally have lower complexity but 
suffer from slow convergence as they require many OFDM symbols to get an accurate estimate of the channel 
autocorrelation matrix. Blind methods based on second-order statistics IfTTI . |[T2l . |fT3l also require the channel to 
be strictly stationary over several OFDM blocks. More often than not, this condition is not fulfilled in wireless 
scenarios (e.g., as in WLAN and fixed wireless applications). Methods based on Cholesky's factorization |[T4l and 
iterative techniques ifTSll suffer from high computational complexity. 

Several ML-based blind methods have been proposed in literature (201, Ull, ||2Tl-||35l, ||37l. Although they 
incur a higher computational cost, their superior performance and faster convergence is very attractive. These 
characteristics make this class of algorithms suitable for block fading scenarios with short channel coherence time. 
Usually, suboptimal approximations are used to reduce the computational complexity of ML-based methods. Though 
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TABLE I 

Literature Classification 
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m, ETi 


Uses pilots 
to resolve 
phase ambiguity 


m. El, a, mni, mi, 
m, nsi, ma, m, m\, 
m, m, f25i, pel, [371 




Constant modulus constellation 


[21, [31, [61, [91, [121, [131, 

m, mi, ii2Qi, m, 
isa, iia, im, usa, wi 


m, a, in, uni, m 
El, m. Hill 



these methods reduce the complexity of the exhaustive ML search, they still incur a significantly high computational 
cost. Some methods like |[2TI . Il23l . Il24l are sensitive to initialization parameters, while others work only for specific 
constellations (see Table lU). A few ML-based algorithms allow the channel to change on a symbol-by-symbol basis 
(e.g., Il26l . Wit ), however, these algorithms are only able to deal with constant modulus constellations. 

To the best of our knowledge no bUnd algorithm in literature is able to deal with channels that change from one 
OFDM symbol to another when the data symbols are drawn from a general constellation. Contrast this with the 
equalization algorithm presented in this paper. The key features of the blind equalization algorithm presented in 
this paper are that it 



1) works with an arbitrary constellation, 

2) can deal with channels that change for one symbol to the next, 

3) does not assume any statistical information about the channel. 



In addition, we propose a low-complexity implementation of the algorithm by utilizing the special structure of 
partial FFT matrices and prove that the complexity becomes especially low in the high SNR regime. 

The paper is organized as follows. Section Ull describes the system model and Section |lll] describes the blind 
equaUzation algorithm. Section |IV] presents an approximate method to reduce the computational complexity of the 
algorithm, while Section |V] evaluates this complexity in the high SNR regime. Section |Vl] presents the simulation 
results and Section IVIII gives the concluding remarks. 



4 



A. Notation 

We denote scalars with small-case letters, x, vectors with small-case boldface letters, x, while the individual 
entries of a vector h are denoted by h{l). Upper case boldface letters, X, represent matrices while calligraphic 
notation, X, is reserved for vectors in the frequency domain. A hat over a variable indicates an estimate of the 
variable, e.g., /i is an estimate of h. (.)^ and (.)^ denote the transpose and Hermitian operations, while the notation 
stands for element-by-element multiplication. The discrete Fourier transform (DFT) matrix is denoted by Q and 
defined as qi^k = e~^^^''~^^^'^~^^ with k,l = 1,2, - ■ ■ ,N (N is the number of subcarriers in the OFDM symbol), 
while the invrse DFT (IDFT) is denoted as Q^. The notation ||a||g represents the weighted norm defined as 
II a II 3 = a^Ba for some vector a and matrix B. 



II. System Model 



Consider an OFDM system where all the N available subcarriers are modulated by data symbols chosen from 
an arbitrary constellation. The frequency-domain OFDM symbol X, of size N x 1, undergoes an IDFT operation 
to produce the time-domain symbol x, i.e. 

X = VnQ^X. (1) 

The transmitter then appends a length L cyclic prefix (CP) to x and transmits it over the channel. The channel 
h, of maximum length L + 1 < N, is assumed to be constant for the duration of a single OFDM symbol, but 
could change from one symbol to the next. The received signal is a convolution of the transmitted signal with the 
channel observed in additive white circularly symmetric Gaussian noise n ~ M{0, 1). The CP converts the linear 
convolution relationship to circular convolution, which, in the frequency domain, reduces to an element-by-element 
operation. Discarding the CP, the frequency-domain received symbol is given by 

y = ^/pnQX+^r, (2) 



where p is the signal to noise ratio (SNR) and y, H, X ,J\f , are the -point DFT's of y, h, x, and additive noise 
n respectively, i.e. 



U = Q 



h 




1 



1 



1 



X = —=Qx, J\f = —=Qn, and 3^ = —=Qy 



N 



N 



(3) 



Note that h is zero padded before taking its A'^-point DFT. Let consist of first L + 1 columns of Q (i.e., A 
consist of first L + 1 rows of Q^), then 



n = A^h and h = AH. 



(4) 



This allows us to rewrite (O as 



y = ^d.i&g{X)A^h + J^. 



(5) 



III. Blind Equalization Approach 



Consider the input/output equation ([S]), which in its element by element form reads 

y{j) = ^pX{3)a}^h+M{3) (6) 

where aj is the jth column of A. The problem of joint ML channel estimation and data detection for OFDM 
channels can be cast as the following minimization problem 



Jml = min 11;^ - diag(A')A"/i"^ 

N 



min y |3^(i)- VP'^'Waf^P 
1=1 

(i N 
J2 \y{j) - v~p xij)afh\' + J2 \yu) - Vp xij)afhf \ (v 



j=l j=i+l 



where denotes the set of a 
A'(j) up to the time index i, i.e 



1 possible A^— dimensional signal vectors. Let us consider a partial data sequence 



and define Mx^^^ as the corresponding cost function, i.e. 

M;f„ = min ||3^(,) - ^ diag(A'(,))Ag)/if, (8) 

where consists of the first i rows of A^. 

In the following, we pursue an idea for blind equalization of single-input multiple-output systems first inspired 
by lfT9l . Let R be the optimal value for the objective function ^ (we show how to determine R in Section UlI-BI 
further ahead). If Mx^^-, > R, then A'(j) can not be the first i symbols of the ML solution to To prove 

this, let and denote the ML estimates and suppose that our estimate A'(j) satisfies 

^(i) = (i) (9) 
i.e. the estimate A'(j) matches the first i elements of the ML estimate. Then, we can write 

Hi,||2 



R = min \\y - ^ diag(X)A"-h 

h,xen'^ II V 



TV 



= - diag(A'5^)AH + \yij) - X^\j)afh^y 

j=i+i 

N 

= ||3^«-^/^diag(A'(,))A«/x'''f + ^ \yij)-^pX^Hj)afh'''^\\ (10) 

j=i+i 

where the last equation follows from Now, clearly 

\\y^i)-^dmg{X^i))Af,^h^'^f > rnin||:V(i)- Vpdiag(A'(,))Af^)/if (11) 

= ||3^(i)- Vpdiag(A'(,))AH^f, (12) 



'Thus, for example X^2) = [X{l),X{2)f and X^n) = [-^^(1), ■ • ■ , X{N)f = X. 



where h is the argument that minimizes the RHS of ([TTI ). Then 

TV 

j=i+l 

> rnin \\y^i^ - ^ diag(A'(i))A{^)h,f 

= M;,,,,. (13) 

So, for A'(j) to correspond to the first i symbols of the ML solution , we should have ^X(.^ < Note that 
the above represents a necessary condition only. Thus if Af(j) is such that ^ < R, then this does not necessarily 
mean that A'(j) coincides with X^-^. 

This suggests the following method for blind equalization. At each subcarrier frequency i, make a guess of 
the new value of and use that along with previous estimated values A'(l), (i — 1) to construct 



Estimate h so as to minimize M^, in (113] ) and calculate the resulting minimum value of M-i, . If the value of 

— ° ^(i) 

M^^^ < R, then proceed to i + 1. Otherwise, backtrack in some manner and change the guess of for some 
j < i. A problem with this approach is that for i < L + 1, given any choice of X{i), h can always be chosen by 
least-squares to make M-^^^ in ([T3] ) equal to zercQ Then, we will need at least L + 1 pilots defying the blind nature 
of our algorithm. Alternatively, our search tree should be at least L + 1 deep before we can obtain a nontrivial (i.e. 
nonzero) value for M-o . 

An alternative strategy would be to find h using weighted regularized least squares. Specifically, instead of 
minimizing the objective function J ml in equation (O, we minimize the maximum a posteriori (MAP) objective 
function 

Jmap= min {\\h\\l-^ + \\y -^dmg{X)A^hf\ (14) 
where Rh is the autocorrelation matrix of h (in Section ITVl we modify the blind algorithm to avoid the need for 



^Since A^^j is full rank for i < L + 1, diag(Af(i))A(^) is full rank too for each choice of diag(Af(i)) and so we will always find some 
h that will make the objective function in l ll3t zero (since h has L+1 degrees of freedom). 



channel statistics). Now the objective function in ([141 ) can be decomposed as 



Jmap = mill < 



N 



j=i j=i+i 



Given an estimate of the cost function reads 



^^•(,.1, = + 113^ - ^diag(Af(i_i))Ag_i)/if } 



mmse= [7?;^^ +pA(i_i)diag(i'(i_i))"diag(A'(i_i))Ag_ ' ^ 



Hi-i)J 



(15) 



(16) 

with the optimum value (see II38I . Chapter 12, pp. 671) 

h = y/p /?/jA(j„i)diag(A'(j„i))[/ + p diag(A'(j_ i))A«_,)/?,A(,_i)diag(A'(,_i))]-i3^(,_i) (17) 
and corresponding minimum cost (MMSE error) 



(18) 



If we have a guess of X{i), we can update the cost function and obtain ^x^.^ In fact, the cost function Af^ is 
the same as that of ^ with the additional observation y{i) and an additional regressor X{i)af, i.e. 

2 . 









y{i-i) 




diag(A'(,_i))AH 


M-{, = mill < 

'^(o h 



















h 



(19) 



We can thus, recursively update the value M^x^-^ based on M^^ using recursive least squares (RLS) 11381 . i.e. 



(20) 



hi = hi^i + Qi [y[i) - ^ X{i)afhi^i 



(21) 
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where 

9i = ./pi{i)X{ifP,.iai (22) 

lii) = (23) 

l + p\X{i)\^afP,^^a, 

Pi = P,_i-/5 7(i)|i(i)|2p,_ia,afPi_i (24) 

These recursions apply for all i and are initialized by 

M^^_^^ = 0, P_i = Rh, and h_i = 

Now, let R be the optimal value for the regularized objective function in (fT4l ). If the value R can be estimated, we 
can restrict the search of the blind MAP solution X to the offsprings of those partial sequences A'(j) that satisfy 
My, < R. This forms the basis for our exact blind algorithm described below. 

A. Exact Blind Algorithm 

In this subsection, we describe the algorithm used to find the MAP solution of the system. The algorithm employs 



the above set of iterations (EOt — (124]) to update the value of the cost function M-j^^ ^ which is then compared with 
the optimal value R. The input parameters for the algorithm are: the received channel output y, the initial search 
radius r, the modulation constellatioiu ^ and the IxA^ index vector /. 

The algorithm is described as follows (the algorithm is also described by the flowchart in Figure [l]) 

1) (Initialize) Set i = 1, = 1 and set X{i) = 

2) (Compare with bound) Compute and store the metric M-j^^^. If Af^^^ > r, go to 3; else, go to 4; 

3) (Backtrack) Find the largest 1 <j <i such that 

< If there exists such j, set i = j and go to 5; else go to 6. 

4) (Increment subcarrier) If i < set i = i + = 1, = and go to 2; else store current 
A'(^), update r = M.^^^^ and go to 3. 

^Examples of the modulation constellation are Q, are 4-QAM and 16-QAM. We use \Q\ to denote the constellation size and Q{k) for the 
fcth constellation point. For example, in 4-QAM \Q\ =4 and f2(l), • • ■ , f2(4) are the four constellation points of 4-QAM. The indicator 
refers to the last constellation point visited by our search algorithm at the ith subcarrier. 
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5) (Increment constellation) Set = + 1 and X{i) = Go to 2. 

6) (End/Restart) If a full-length sequence ^(^n) has been found in Step 4, output it as the MAP solution and 
terminate; otherwise, double r and go to 1. 

The essence of the algorithm is to eliminate any choice of the input that increments the objective function beyond 
the radius r. When such a case is confronted, the algorithm backtracks (Step 3 then Step 5) to the nearest subcarrier 
whose alphabet has not been exhausted (the nearest subcarrier will be the current subcarrier if its alphabet set is 
not exhausted). 

The other dimension the algorithm works on is properly sizing r; if r is too small such that we are not able to 
backtrack, the algorithm doubles r (Step 3 then Step 6). If on the other hand r is too large that we reach the last 
subcarrier too fast, the algorithm reduces r to the most recent value of the objective function, (r = Mx^^^^) and 
backtracks (Step 4 then Step 3). 




Fig. 1. Flowchart of the blind algorithm. 
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Remark 1: The backtracking algorithm depends heavily on calculating the cost function using (|20l)-(l24l). In the 
constant modulus case, the values of ^[^^(i)^ in equations ( [23] ) and (l24l) become constant (equal to p Ex) for all 
i, and the values of ^{i) and Pi become 

= ^^ r ^Hp (25) 

Pi = P^^l- p£xl{i)P^-la^afPi^l, (26) 

which are independent of the transmitted signal and thus can be calculated offline. 

Remark 2: The algorithm can also be used for a pilot-based standard. In this case, when the algorithm reaches a 
pilot holding-subcarrier, no backtracking is performed as the value of the data carrier is known perfectly. In the 
presence of pilots, it is wise to execute the algorithms over the pilot-holding subcarriers first and subsequently move 
to the data subcarriers. For equispaced comb-type pilots, (semi)-orthogonality of regressors is still guaranteed. 
Remark 3: Like all blind algorithms, we use one pilot bit to resolve the sign ambiguity (see references in Table ID. 



B. Determination of initial radius p, and r 

Our algorithm depends on p, and r which we need to determine. The receiver can easily estimate p by 
measuring the additive noise variance at its side. As for the channel covariance matrix R^, our simulations show 
that with carrier reordering we can replace R^ with identity with essentially no effect on performance. This becomes 
especially true in the high SNR regime. It remains to obtain an initial guess of the search radius r. To this emd, 
note that if h and X are perfectly known (with h drawn from J\f{0, Rh) but is known) then 

^ = \\h\\l-^ + \\y-VP diag{X) A^hf (27) 

is a chi-square random variable with k = 2{N + L + 1) degrees of freedorto Thus, the search radius should be 
chosen such that > r) < e, where > r) = 1 — F{r; k), and where F(r; k) is the cumulative distribution 



''The first term on the right hand side has 2(L + 1) degrees of freedom as h is Gaussian distributed wfiile the second term has 2N degrees 
of freedom as ^ — diag(Af) A^/i is just Gaussian noise. 
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function of the chi-square random variable given by 



Fir- k) = (28) 
^ '^^ T{k/2) ' ^ 



Here, r/2) is the lower incomplete gamma function defined as 



l-r/2 

7(A:/2, r/2) = / t^'"^'^ e'^ dt. (29) 





So, under this initial radius, we guarantee finding the MAP solution with probability at least 1 — e. In case a solution 
is not found, the algorithm doubles the value of r and starts over. This process continues until a solution is found. 
For example, when = 64, L = 15 and e = 0.01, the value of our radius should be set to 204. 

IV. An Approximate Blind Equalization Method 

There are two main sources that contribute to the complexity of the exact blind algorithm of Section |llll 

1) Calculating Pj.- the second step of the blind algorithm requires updating the metric M^^^ . This metric 
depends heavily on operations involving the (L + 1) x (L + 1) matrix Pi which are the most computationally 
expansive (see Table HI] which estimates the computational complexity of the RLS). 

2) Backtracking: When the condition M-^^ ^ < r is not satisfied, we need to backtrack and pursue another branch 
of the search tree. This represents a major source of complexity. 

In the following, we show how we can avoid calculating Pi all together. We postpone the issue of backtracking to 
Section jV] 

A. Avoiding Pi 

Note that in the RLS recursions (|20l)— (l24l). Pj always appears multiplied by a^. Let's see how this changes if we 
set P_i = I and assume that the a^'s are orthogonal or, in particular, if we assume that apaj+i = = 0. 

With these assumptions note that 

7(0) = . — = ^ (30) 

l + p |A'(0)PaHp_iao 1 + p |A'(0)|2(L + 1) 
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i.e., 7(0) is independent of Also note that 

Poai = P-iai- pj{0)\X{0)\^P_iaoa^P-iai 
= ai-/>7(0)|^(0)|2aoa^ai 

= ai. (31) 

For a similar reason 

Poa2 = a2. (32) 



From (I31I ). it is also easy to conclude that 

7(1) = ^ (33) 

1 + ^1-^(1)^(^ + 1) 

i.e., 7(1) is independent of Pq. Also, from ( [3T| ) and (|32l ) it follows that Pjaj+i = aj+i and Piai^2 = We 
now investigate what happens to Pj+i. 

Pi+iai+2 = Piai+2-P7(« + l)l'^'(^ + l)l^^iai+ia!+i-Piai+2 
= ai+2 - pj{i + + l)pai+ia^iai+2 

= ai+2- (34) 

Similarly, 

Pi+ifli+a = (35) 



So, by induction we see that each occurrence of PjOj in the recursion set (I20l)-(l23l) can be replaced with a^. This 
allows us to discard (l24l ). i.e., 

M^^^^ = M^^^_^^+ji{}\y{i)-^Xii)afh,_,\^ (36) 



hi = h,.^+gJy{i)-^Xii)afh,^, (37) 
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TABLE II 

Estimated computational cost per iteration of the RLS algorithm 



Term 


X 


+ 




/oX(i)a^h^ 1 


2L + 2 








1 
1 


1 

1 




P 


1 

i_ 




1 




1 
i 


1 

1 




I, 

hi 


r 1 


r 1 1 
iv + I 


1 
I 




r 2 1 r 1 1 






9i 


L + 3 








L + 1 


L 




7(i) 


3 


1 


1 


«fj'.-i 


L2 + 2L + 1 


L2 + L 




Pi 


L2 + 2L + 2 


L2 + 2L + 1 




Total per iteration 


+ IIL + 17 


2L^ + 5L + 4 


3 



where 



9i = ./p7ii)Xii)''a, (38) 

7(i) = ^ . (39) 

l + p\X{i)\^L + l) 

Thus, the approximate bhnd RLS algorithm is effectively running at LMS complexity. Table |ll] summarizes the 
computational complexity incurred in the RLS calculation. 



B. Avoiding Pi with Carrier Reordering 

The reduction in complexity above is based on two assumptions. The first assumption is to set P_i = / (instead 
of Rh) and the second is to assume that the consecutive a^'s are orthogonal. Note that the aj's are columns of 
A, i.e. they are partial FFT vectors. As such, strictly speaking, they are not orthogonal. Notice, however, that for 



U"-^(i-i')k) 



(40) 



A;=0 



which after straightforward manipulation can be shown to be 



\afai 



1 

L+l 



L + 

sin{Tr{i-i')j-) 



{Z = i') 
{i / Z') 



(41) 
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This is a function of (i — i') mod N . Thus, without loss of generality, we can set i' = 1 and plot this autocorrelation 
with respect to i. The autocorrelation decays with i as shown in Figure |2] We can use this observation in 




i 



Fig. 2. Autocorrelation vs i for = 64 and L — 15 

implementing our blind RLS algorithm. Specifically, note that the whole OFDM data is available to us and so we can 
visit the data subcarriers in any order we wish. The discussion above shows that the data subcarriers should be visited 
in the order i, i + A, i + 2 A, . . . where A should be chosen as large as possible to make a,, Oj+A, ttj+2A! • • • 
as orthogonal as possible, but small enough to avoid revisiting (or looping back to) a neighborhood too early. We 
found the choice A = to be a good compromise. From Figure [2l which plots (|4TI) for = 64 and L = 15, 
columns 1, 5, 9, 13, 17, 21, • • • ,61 are orthogonal to each other and so are the columns 2, 6, 10, 14, 18, • • • , 62. So, 
if the vectors are visited in the following order 1, 5, 9, 13, 17, 21, • • • , 61, 2, 6, 10, 14, 18, • • • , 62, • • • , then we have 
a consecutive set of vectors that are orthogonal. The only exception is in going from column 61 to 2. These two 
columns are not really orthogonal but are nearly orthogonal (the correlation of columns 1 and 61 is zero, so the 
correlation of 61 with 2 should be very small since the correlation function is continuous as shown in Figure H). 
In general, we chose A = and visit the columns in the order i + A,i + 2A, • • • ,i + LA, i = 1, - ■ ■ , A — 1. 

Our simulation results show that the BER we get with exact calculation of Pj and that obtained when we set 
P i = / with subcarrier reordering are almost the same. Table Hill gives the computational complexity incurred in 
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TABLE III 

Estimated computational cost per iteration of the RLS algorithm with Carrier Reordering 



Term 


X 


+ 






2L + 2 


L 




\yii)-^X{z)afh,^,\^ 


1 


1 




p-f{i) 


1 




1 




1 


1 




hi 


L + 2 


L + 1 


1 




3 


1 


1 


Total per iteration 


4L + 13 


2L + 4 


3 



the RLS calculation when subcarrier reordering is used (i.e., free from Pj calculation). 

Note that with subcarrier reordering, the new version of the RLS runs without the need to use the power delay 
profile statistics, which relieves us from the need to provide this information. 



V. Computational Complexity in the High SNR Regime 

In the section, we study the other source of complexity (backtracking) and show that there is almost no 
backtrackings in the high SNR regime. To this end, consider the behavior of the algorithm when processing the 
ith subcarrier. There are \Q\ different alphabet possibilities to choose from at this subcarrier and a similar number 
of possibilities at the preceding i — I subcarriers, creating a total of — 1 incorrect sequences A'(j) and one 
correct sequence ^(i)- The best case scenario is to have only one sequence that satisfies M^p^.^ < r in which case 
there would be only one node to visit. The worst case is having to visit the remaining — 1 wrong nodes before 
reaching the true sequence (visiting of nodes will happen through backtracking); this latter case is equivalent to the 
exhaustive search scenario (i.e., all possible sequences satisfy Mp^^,^ < r). Thus, if we let Cj denote the expected 
number of nodes visited at the ith subcarrier, then from above we can write 

a<i + - i)Pi (42) 



'The term "backtracking" refers to the case when the algorithm is currently at subcarrier i and it has to change the estimate of the 
data symbol at some subcarrier j < i. On the other hand, sweeping the constellation points at subcarrier to find the first one that satisfies 
^Xf^i-i < r is not considered backtracking. 
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where Pi is the maximum probabiUty that an erroneous sequence of symbols A'(j) 7^ A'(j) has a cost less than r. 
We will show that this probability becomes negligibly small at high SNR values. Recall that 



y^i) = Vpdiag(A'(,))Ag)/i +Ar(, 



i) 



(43) 



where J^(i) denotes the first i symbols of Af. Note the (l43l) can be written as 



Vpdiag(Af(i))AH / 



h 



(44) 



We first prove our claim for the least squares (LS) cost and then show how the MAP cost reduces to LS cost for 
high SNR. 



A. LS cost 

Suppose we have an erroneous sequence of symbols A'(j) 7^ '^(i)- The LS estimate of h is found by minimizing 
the objective function 

Jls = ^min^ - diag(Ar(i))A« /if} (45) 

and the solution of h is (see ll38l . Chapter 12, pp. 664) 

h = [A(i)diag(A'|^))diag(i'(i))Ag)]-i^ A(i)diag(A'J^)):V(i). (46) 

The cost associated with the LS solution is given by (see 1381 . Chapter 11, pp. 663) 

^^(.) = Vpdiag(A'(,))A« (VpA(,)diag(A'(,))«Vpdiag(-*(,))AH)"'vpA,)diag(A'J]))):y;(,) 

= (l-p diag(A'(,))AH (/, A(,)|diag(Af(i))|2Ag))~' A^^d\s.g{X%))y 

Mx,, = yl){l-D)y^^ (47) 



where 



D = diag(A'(,))AH (A(,)|diag(A'(,))|2AH j A(,)diag(A'^)). 
So the probabiUty that the sequence A'(j) satisfies Mp^^,^ < r reads 



= Pr(M^^^, < r) 



In the strict sense of the word, backtracking means visiting Step 3 in our algorithm. Substituting (|44] 



//r 



Pi = Pr 



VV 



h 



G 



h 



\ \ 

< r 



/ 



where 



G 



(0 



^ A(i)diag(A'(j)) 



\I-D] 



^diag(i'(i))Aj^) / 



Let B = diag(Ar(j))A|^-j, then can be written as 



pB^[I - D]B B^ [I -D]I 
III- D]B I\I -D]I 



which in compact form can be expressed as 



G 



pE E2 



^2 -^3 



Using the Chemoff bound the right hand side of (l50l ) can be bounded in the following way 







/ 


h 


H 


h 


\ 




Pi < e'^^'E 


exp 






Git) 












v 




















/ 
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Noting that 



with 



h 



'AA(0,S(,)) 



(55) 



Rh 
I, 



(56) 



we can solve the expression in (154] ) as 



exp 





h 


H 


h 


\ 




/ 


h 


H 


h 


\ 












exp 












v 












v 
















/ 








) 



dhdAf 



fir ~r-{L+i+l) 



e f^' vr 



exp 



/ 


h 


H 


h 


\ 














dhdAf (j) 


V 








/ 













exp 



h 

AT, 



(i) 



g— /iryp(L+i+l) 

\ 



dhdAf, 



(i) 



/jr7]-(L+i+l) 



(57) 



Note that the numerator in (157] ) is a multi-variate complex Gaussian integral. Recall that an n-dimensional complex 
Gaussian integral has the solution (see |[T9l ) 



exp I - ||x||^ 



dx 



■K" 



det(VF) 



(58) 



This allows us to simplify (ISTt as 



det(S(j) +/iG(i)) 



(59) 



Next, we show that the probability — )• as p — )• oo. To show this, we just need to show that the largest eigenvalue 
of the term in the denominator goes to infinity as p — )^ cxd. 



Lemma 1: Let E = A(j)diag(A'(j))[/ — Z)]diag(A'(j))A^^ be a (L + 1) x + matrix, then for any sequence 
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Pt'(j), E has a positive maximum eigenvalue, A^ax and a corresponding unit-norm eigenvector v of size (L + 1) x 1. 
Proof: Recall that 

D = diag(A'(,))Ag) (A(,)diag(A'[l))diag(A'(,))AH A(,)diag(A'^)) (60) 

and let F = diag(A'(j))A|^^, then we can write the above equation as 

D = F (F^F) ~^ F^ = FF'f (61) 

where F^ = (^F^F) ^ F^ is the Moore-Penrose pseudo-inverse^ (see BTI . Chapter 5, pp. 422). Therefore, D is 
an idempotent matrix with eigenvalues equal to either or 1 BOl and hence, [/ — D] is also a positive semi-definite 
idempotent matrix. Note also that the matrix E in (1531 ) can be written as 



E = A(i)diag(Af|;))[J-r>]diag(Ar(i))Ag) 



B^[I-D]B (62) 



and 

z^Ez = z^B^[I - D]Bz = (-Bz)"[/ - D]{Bz) > (63) 
and so E is Hermitian and positive semi-definite. 

Let U = [ui U2 • • • ul_|_i] be a (L + 1) X (L + 1) unitary matrix where Uj is the ith eigenvector then, 
E = UAU^ where A is a diagonal matrix containing ordered eigenvalues of E such that Ai > A2 > • • • > A^+i. 



^the columns of F are linearly independent. 
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Let z = [/^v, then the maximum eigenvalue of E is given as 



max v^£^v = max z^Az (64) 

v||2 = l ||z||2 = l 



L+1 



max 

|z||2 = 



c y,X^\zi\^ (65) 

1=1 
L+1 

< max Ai > IzjP (66) 

|z||2 = l ^ 
1 = 1 

< Ai = Amax (67) 



The equaUty is attained when v is the eigenvector of An 



Lemma 2: Given that E has a positive maximum eigenvalue A^ax with corresponding unit-norm vector v of 
size (L + 1) X 1, then the maximum eigenvalue of in (|52l ) is lower bounded by w^G(j)W = p Amax where 



w 



V(L+l)xl 
Oixl 



(68) 



Proof: From Lemma 1, the largest eigenvalue of E is Amax- It follows that the largest eigenvalue of pE is 
pAmax- Let A'max be the largest eigenvalue of From (l53l) . we can see that pE is a principal sub-matrix of 
(see |41], Chapter 7, pp. 494) and thus 

(69) 

i.e., the largest eigenvalue of the principal sub-matrix pE is smaller than or equal to the largest eigenvalue of 
(see fin . Chapter 7, pp. 551-552). Thus pAmax is a lower bound on the largest eigenvalue of ■ 



Note that is positive definite as it is a covariance matrix, hence it will have positive eigenvalues. From Lemma 
2, the maximum eigenvalue of Amax — oo as p — oo. Thus the denominator in ( [59l ) grows to infinity in the 

limit p — )• oo and 

lim (70) 

p— i>00 



22 



From (1421 ) and (1701 ). we have 



lim Ci < 1 + - 1) lim Pi (71) 
lim Ci < 1 (72) 

p— >CJO 



S. MAP cost 



The cost associated with the MAP solution of an erroneous sequence of symbols A'(j) ^ A'(j) is given as (see 
21, Chapter 11, pp. 672) 



(73) 



Mathematically, 



= PrU^||J/ + pdiag(A'(,))A||)/?;,A(i)diag(Ar^))j y^i^< 



(74) 



By matrix inversion lemma 



+ y/p diag(A'(i))Ag)/?,,A(i)diag(A'(i)) 



-1 



I-p diag(A'(,))A[^) + p A(,)diag(A'^))diag(A'(,))A[|)J A(,)diag(A'^)) (75) 
/-diag(A'(i))A}^) i 7?^^ + A(i)diag(ArJ]))diag(A'(,))A}^) A(i)diag(Ar|])) 



7- i:) 



(76) 



where 



£> = diag(A'(i))A 



^ 7?^^ + A(i)diag(A'J]))diag(Ar(i))Ag) 



-1 



A(i)diag(A'(i)) 



(77) 



Thus (1741) can be written as 



(78) 
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TABLE IV 

Total computational cost of the ML blind and training based algorithms at high SNR 



\ Icrrkr'il'liiTi 
JrVl^ljl ILillll 


V 


_i_ 


Blind Algorithm 


(3L2 + 11L + 17)iV 


(2L2 + 5L + 4)N 


Blind algorithm 
with 

carrier reordering 


(4L + 13)A^ 


{2L + A)N 


Training based 
algorithm 1391 


4L2 + 17L + 13 


2L2 + 6L + 4 



note that dTSl ) is of the same form as ( |49l l. The only difference in the LS and MAP costs is the presence of the 
term ^ Rf^^ in (TTTI ). Also note that this term depends on the inverse of the SNR. For low SNR, the inverse term in 
(TTTI ) is always invertible due to the regularization term. At high SNR, the effect of regularization fades and inverse 
term in (TTTI) is invertible. At high SNR, i.e., p ^ oo, ^ R~^^ — )• and D of (1761 ) takes the same form as that of 
LS cost leading to (TTIl l. 

Table |IV] lists the estimated computational cost for our blind algorithm in the high SNR regime. Since there is 
no backtracking, the total number of iterations is A^, which explains our calculations in Table JV] It thus follows 
that the total number of operations needed for our algorithm is of the order 0{LN) in high SNR regime. The pilot 
based approach for channel estimation needs to invert an (L + 1) x (L + 1) matrix (assuming we need L + 1 pilots 
to estimate a channel of length L + 1) with a complexity of the order O(L^). Since the cyclic prefix is a fixed 
fraction of the OFDM symbol {L = N/ m with m typically set to m = 4 or 8) we see that the complexity of the 
two approaches become comparable in the high SNR regime. 

VI. Simulation Results 

We consider an OFDM system with = 16, or 64 subcarriers and a CP of length L = ^. The uncoded data 
symbols are modulated using BPSK, 4-QAM, or 16-QAM. The constructed OFDM signal then passes through a 
channel of length L + 1, which is assumed to be block fading (i.e., constant over one OFDM symbol but fades 
independently from one symbol to another) and whose taps follow an exponential decay profile = e~'^'^*). 
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A. Bench marking 

We compare the performance of our algorithm against the following receivers 

1) the subspace-baseqj blind receiver of lITOll . 

2) the sphere decoding based receiver of |[28l . 

3) a receiver that acquires the channel through training with L + 1 pilots and a priori channel correlation 

m, 

4) the ML receiver that acquires data through exhaustive search. 
The simulations are averaged over 500 Monte-Carlo runs. 

Figure [3] compares the BER performance of our algorithm with the aforementioned algorithms for an OFDM 
system with = 16 subcarriers and BPSK data symbols. Note in particular that our blind algorithm outperforms 
both the subspace and sphere decoding algorithms and almost matches the performance of the exhaustive search 
algorithm for low and high SNR, which confirms the ML nature of the algorithm. 

Figure |4l which considers the 4-QAM case, shows the same trends observed for the BPSK case of Figure [3] 




15 

SNR(dB) 



Fig. 3. BER vs SNR for BPSK OFDM over a Rayleigh channel with = 16 and L = 3 



^The block fading assumption is maintained for all simulations. However, for the subspace blind receiver of |10| to work, the channel 
needs to stay constant over a sequence of OFDM symbols. For this particular receiver, the channel was kept fixed over 50 OFDM symbols. 
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Fig. 4. BER vs SNR for 4-QAM OFDM over a Rayleigh channel with iV = 16 and L = 3 



Figure [5] considers a more realistic OFDM symbol length {N = 64), drawn from a 4-QAM constellation and 
allows the SNR to grow to 45 dB. Our blind algorithm shows no error floor signs, which is characteristic of non-ML 
methods. Furthermore, the algorithm beats the training-based method and follows closely the performance of the 
perfect channel case. Figure [6] shows the results with N = 64 subcarriers and 16-QAM data symbols for SNR as 
large as 50 dB. Again, the proposed blind algorithm does not reach an error floor. 

B. Low-Complexity Variations 

In this subsection, we investigate the low-complexity variants of our algorithm. Specifically, we consider the 
performance of the blind algorithm with 

1) Pi set to /, 

2) Pi set to / with subcarrier reordering 

Figure |7] exhibits the comparisons for the various algorithms for BPSK and = 16. Note that with Pi set to / 
arbitrarily, the performance of the blind algorithm deteriorates and the BER reaches an error floor. Contrast this 
with the algorithm variant that uses subcarrier reordering as well, and note that the performance of this variant 
follows closely the performance of the exact blind algorithm. Also note that the BER of both of these algorithms 



26 




SNR(clB) 

Fig. 5. BER vs SNR for 4-QAM OFDM over a Rayleigh channel with iV = 64 and i = 15 




SNR(dB) 

Fig. 6. BER vs SNR for 16-QAM OFDM over a Rayleigh channel with iV = 64 and L = 15 



beats that of the sphere decoding algorithm of |[28l . The same trends are observed in Figure [H which considers the 
4-QAM case. 

Figure |9] compares the average runtime of various algorithms as a function of the SNR. Note first that the extreme 
cases are the training-based receiver and the exhaustive search receiver, both of which are independent of the SNR. 
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SNR(dB) 

Fig. 7. Comparison of low-complexity algorithms for BPSK OFDM with = 16 and L = 3 



The runtime of the proposed algorithm decreases with the SNR and is sandwiched in-between the run time of the 
sphere decoding algorithm and that of the subspace algorithm for all values of the SNFQ Note that in the high 
SNR regime our algorithm runs at the same speed as the subspace algorithm. 

Figure [TO] shows the average runtime of the proposed algorithm with = 16 for various modulation schemes 
(BPSK, 4-QAM and 16-QAM). It is clear from the figure that the average runtime decreases considerably at higher 
SNR values. 

VII. Conclusion 

In this paper, we have proposed a low-complexity blind algorithm that is able to deal with channels that change 
on a symbol by symbol basis allowing it to deal with fast block fading channels. The algorithm works for general 
constellations and is able to recover the data from output observations only. Simulation results demonstrate the 
favorable performance of the algorithm for general constellations and show that its performance matches the 
performance of the exhaustive search for small values of N. 

^The runtime of the subspace algorithm is adjusted to account for the fact that it requires the channel to be constant over a block of L + 1 
OroM symbols. 
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Fig. 8. Comparison of low-complexity algorithms for 4-QAM OFDM with 16 and L = 3 
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Fig. 9. Average time comparison for BPSK data symbols with A^ = 16 and L = 3 



We have also proposed an approximate blind equalization method (avoiding Pj with subcarrier reordering) to 
reduce the computational complexity. As evident from the simulation results, this approximate method performs 
quite close to the exact blind algorithm and can work properly without a priori knowledge of the channel statistics. 
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15 20 
SNR(dB) 

Fig. 10. Average Time Comparison for our Blind Algorithm for Different Modulation with iV = 16 and L = 3 



Finally, we study the complexity of our blind algorithm and show that it becomes especially low in the high SNR 
regime. 
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Abstract 

This paper proposes a low-complexity algorithm for blind equalization of data in OFDM-based wireless systems 
with general constellation. The proposed algorithm is able to recover data even when the channel changes on a 
symbol-by-symbol basis, making it suitable for fast fading channels. The proposed algorithm does not require any 
statistical information of the channel and thus does not suffer from latency normally associated with blind methods. 
We also demonstrate how to reduce the complexity of the algorithm, which becomes especially low at high SNR. 
Specifically, we show that in the high SNR regime, the number of operations is of the order 0{LN), where L is the 
cyclic prefix length and N is the total number of subcarriers. Simulation results confirm the favorable performance 
of our algorithm. 

Index Terms 

OFDM, channel estimation, maximum-likelihood detection, maximum a posteriori detection and recursive least 
squares. 
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I. Introduction 

Modem wireless communication systems are expected to meet an ever increasing demand for high data rates. A 
major hindrance for such high data rate systems is multipath fading. Orthogonal frequency division multiplexing 
(OFDM), owing to its robustness to multipath fading, has been incorporated in many existing standards (e.g., IEEE 
802.11, IEEE 802.16, DAB, DVB, HyperLAN, ADSL etc.) and is also a candidate for future wireless standards 
(e.g., IEEE 802.20). All current standards use pilot symbols to obtain channel state information (needed to perform 
coherent data detection). This reduces the bandwidth available for data transmission, e.g., the IEEE 802. lln standard 
uses 4 subcarriers for pilots, that is 7.1% of the available bandwidth, of the 56 subcarriers available for transmission. 
Blind equalization methods are advantageous as they do not require regular training/pilots symbols, thus freeing up 
valuable bandwidth. 

Several works exist in literature on blind channel estimation and equalization. A brief classification of these 
works based on a few commonly used constraints/assumptions is given in Table U (note that this list is not 
exhaustive). Broadly speaking, the literature on blind channel estimation can be classified into maximum-likelihood 
(ML) methods and non-ML methods. 

The non-ML methods include approaches based on subspace techniques ifTl- lfTOl . second-order statistics ifTTI . 
ifTll . |fT3l . Cholesky factorization llT4l . iterative methods ifHl . virtual carriers |16| real signal characteristics ifTTl 
and linear precoding |[T2l . ifTSl . Subspace-based methods Hl-llll, f/l- lfTOl generally have lower complexity but 
suffer from slow convergence as they require many OFDM symbols to get an accurate estimate of the channel 
autocorrelation matrix. Blind methods based on second-order statistics IfTTI . |[T2l . |fT3l also require the channel to 
be strictly stationary over several OFDM blocks. More often than not, this condition is not fulfilled in wireless 
scenarios (e.g., as in WLAN and fixed wireless applications). Methods based on Cholesky's factorization |[T4l and 
iterative techniques ifTSll suffer from high computational complexity. 

Several ML-based blind methods have been proposed in literature (201, Ull, ||2Tl-||35l, ||37l. Although they 
incur a higher computational cost, their superior performance and faster convergence is very attractive. These 
characteristics make this class of algorithms suitable for block fading scenarios with short channel coherence time. 
Usually, suboptimal approximations are used to reduce the computational complexity of ML-based methods. Though 
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TABLE I 

Literature Classification 





T imi t"f*H r\\7 

LjiiiiiLcu. uy 


i>HJL 111111 LCU u y 


Channel constant over 
ivl symDois, M > i 


CI, m, nil, 151, 161, 0, 13, M, 

m, flu, HI, mi, nisi, na, 

LL/Ji, LLol, IIAUi, yjj, ||24||, gz/i, UM 


m, ETi 


Uses pilots 
to resolve 
phase ambiguity 


m. El, a, mni, mi, 
m, nsi, ma, m, m\, 
m, m, f25i, pel, [371 




Constant modulus constellation 


[21, [31, [61, [91, [121, [131, 

m, mi, ii2Qi, m, 
isa, iia, im, usa, wi 


m, a, in, uni, m 
El, m. Hill 



these methods reduce the complexity of the exhaustive ML search, they still incur a significantly high computational 
cost. Some methods like |[2TI . Il23l . Il24l are sensitive to initialization parameters, while others work only for specific 
constellations (see Table lU). A few ML-based algorithms allow the channel to change on a symbol-by-symbol basis 
(e.g., Il26l . Wit ), however, these algorithms are only able to deal with constant modulus constellations. 

To the best of our knowledge no bUnd algorithm in literature is able to deal with channels that change from one 
OFDM symbol to another when the data symbols are drawn from a general constellation. Contrast this with the 
equalization algorithm presented in this paper. The key features of the blind equalization algorithm presented in 
this paper are that it 



1) works with an arbitrary constellation, 

2) can deal with channels that change for one symbol to the next, 

3) does not assume any statistical information about the channel. 



In addition, we propose a low-complexity implementation of the algorithm by utilizing the special structure of 
partial FFT matrices and prove that the complexity becomes especially low in the high SNR regime. 

The paper is organized as follows. Section Ull describes the system model and Section |lll] describes the blind 
equaUzation algorithm. Section |IV] presents an approximate method to reduce the computational complexity of the 
algorithm, while Section |V] evaluates this complexity in the high SNR regime. Section |Vl] presents the simulation 
results and Section IVIII gives the concluding remarks. 



4 



A. Notation 

We denote scalars with small-case letters, x, vectors with small-case boldface letters, x, while the individual 
entries of a vector h are denoted by h{l). Upper case boldface letters, X, represent matrices while calligraphic 
notation, X, is reserved for vectors in the frequency domain. A hat over a variable indicates an estimate of the 
variable, e.g., /i is an estimate of h. (.)^ and (.)^ denote the transpose and Hermitian operations, while the notation 
stands for element-by-element multiplication. The discrete Fourier transform (DFT) matrix is denoted by Q and 
defined as qi^k = e~^^^''~^^^'^~^^ with k,l = 1,2, - ■ ■ ,N (N is the number of subcarriers in the OFDM symbol), 
while the invrse DFT (IDFT) is denoted as Q^. The notation ||a||g represents the weighted norm defined as 
II a II 3 = a^Ba for some vector a and matrix B. 



II. System Model 



Consider an OFDM system where all the N available subcarriers are modulated by data symbols chosen from 
an arbitrary constellation. The frequency-domain OFDM symbol X, of size N x 1, undergoes an IDFT operation 
to produce the time-domain symbol x, i.e. 

X = VnQ^X. (1) 

The transmitter then appends a length L cyclic prefix (CP) to x and transmits it over the channel. The channel 
h, of maximum length L + 1 < N, is assumed to be constant for the duration of a single OFDM symbol, but 
could change from one symbol to the next. The received signal is a convolution of the transmitted signal with the 
channel observed in additive white circularly symmetric Gaussian noise n ~ M{0, 1). The CP converts the linear 
convolution relationship to circular convolution, which, in the frequency domain, reduces to an element-by-element 
operation. Discarding the CP, the frequency-domain received symbol is given by 

y = ^/pnQX+^r, (2) 



where p is the signal to noise ratio (SNR) and y, H, X ,J\f , are the -point DFT's of y, h, x, and additive noise 
n respectively, i.e. 



U = Q 



h 




1 



1 



1 



X = —=Qx, J\f = —=Qn, and 3^ = —=Qy 



N 



N 



(3) 



Note that h is zero padded before taking its A'^-point DFT. Let consist of first L + 1 columns of Q (i.e., A 
consist of first L + 1 rows of Q^), then 



n = A^h and h = AH. 



(4) 



This allows us to rewrite (O as 



y = ^d.i&g{X)A^h + J^. 



(5) 



III. Blind Equalization Approach 



Consider the input/output equation ([S]), which in its element by element form reads 

y{j) = ^pX{3)a}^h+M{3) (6) 

where aj is the jth column of A. The problem of joint ML channel estimation and data detection for OFDM 
channels can be cast as the following minimization problem 



Jml = min 11;^ - diag(A')A"/i"^ 

N 



min y |3^(i)- VP'^'Waf^P 
1=1 

(i N 
J2 \y{j) - v~p xij)afh\' + J2 \yu) - Vp xij)afhf \ (v 



j=l j=i+l 



where denotes the set of a 
A'(j) up to the time index i, i.e 



1 possible A^— dimensional signal vectors. Let us consider a partial data sequence 



and define Mx^^^ as the corresponding cost function, i.e. 

M;f„ = min ||3^(,) - ^ diag(A'(,))Ag)/if, (8) 

where consists of the first i rows of A^. 

In the following, we pursue an idea for blind equalization of single-input multiple-output systems first inspired 
by lfT9l . Let R be the optimal value for the objective function ^ (we show how to determine R in Section UlI-BI 
further ahead). If Mx^^-, > R, then A'(j) can not be the first i symbols of the ML solution to To prove 

this, let and denote the ML estimates and suppose that our estimate A'(j) satisfies 

^(i) = (i) (9) 
i.e. the estimate A'(j) matches the first i elements of the ML estimate. Then, we can write 

Hi,||2 



R = min \\y - ^ diag(X)A"-h 

h,xen'^ II V 



TV 



= - diag(A'5^)AH + \yij) - X^\j)afh^y 

j=i+i 

N 

= ||3^«-^/^diag(A'(,))A«/x'''f + ^ \yij)-^pX^Hj)afh'''^\\ (10) 

j=i+i 

where the last equation follows from Now, clearly 

\\y^i)-^dmg{X^i))Af,^h^'^f > rnin||:V(i)- Vpdiag(A'(,))Af^)/if (11) 

= ||3^(i)- Vpdiag(A'(,))AH^f, (12) 



'Thus, for example X^2) = [X{l),X{2)f and X^n) = [-^^(1), ■ • ■ , X{N)f = X. 



where h is the argument that minimizes the RHS of ([TTI ). Then 

TV 

j=i+l 

> rnin \\y^i^ - ^ diag(A'(i))A{^)h,f 

= M;,,,,. (13) 

So, for A'(j) to correspond to the first i symbols of the ML solution , we should have ^X(.^ < Note that 
the above represents a necessary condition only. Thus if Af(j) is such that ^ < R, then this does not necessarily 
mean that A'(j) coincides with X^-^. 

This suggests the following method for blind equalization. At each subcarrier frequency i, make a guess of 
the new value of and use that along with previous estimated values A'(l), (i — 1) to construct 



Estimate h so as to minimize M^, in (113] ) and calculate the resulting minimum value of M-i, . If the value of 

— ° ^(i) 

M^^^ < R, then proceed to i + 1. Otherwise, backtrack in some manner and change the guess of for some 
j < i. A problem with this approach is that for i < L + 1, given any choice of X{i), h can always be chosen by 
least-squares to make M-^^^ in ([T3] ) equal to zercQ Then, we will need at least L + 1 pilots defying the blind nature 
of our algorithm. Alternatively, our search tree should be at least L + 1 deep before we can obtain a nontrivial (i.e. 
nonzero) value for M-o . 

An alternative strategy would be to find h using weighted regularized least squares. Specifically, instead of 
minimizing the objective function J ml in equation (O, we minimize the maximum a posteriori (MAP) objective 
function 

Jmap= min {\\h\\l-^ + \\y -^dmg{X)A^hf\ (14) 
where Rh is the autocorrelation matrix of h (in Section ITVl we modify the blind algorithm to avoid the need for 



^Since A^^j is full rank for i < L + 1, diag(Af(i))A(^) is full rank too for each choice of diag(Af(i)) and so we will always find some 
h that will make the objective function in l ll3t zero (since h has L+1 degrees of freedom). 



channel statistics). Now the objective function in ([141 ) can be decomposed as 



Jmap = mill < 



N 



j=i j=i+i 



Given an estimate of the cost function reads 



^^•(,.1, = + 113^ - ^diag(Af(i_i))Ag_i)/if } 



mmse= [7?;^^ +pA(i_i)diag(i'(i_i))"diag(A'(i_i))Ag_ ' ^ 



Hi-i)J 



(15) 



(16) 

with the optimum value (see II38I . Chapter 12, pp. 671) 

h = y/p /?/jA(j„i)diag(A'(j„i))[/ + p diag(A'(j_ i))A«_,)/?,A(,_i)diag(A'(,_i))]-i3^(,_i) (17) 
and corresponding minimum cost (MMSE error) 



(18) 



If we have a guess of X{i), we can update the cost function and obtain ^x^.^ In fact, the cost function Af^ is 
the same as that of ^ with the additional observation y{i) and an additional regressor X{i)af, i.e. 

2 . 









y{i-i) 




diag(A'(,_i))AH 


M-{, = mill < 

'^(o h 



















h 



(19) 



We can thus, recursively update the value M^x^-^ based on M^^ using recursive least squares (RLS) 11381 . i.e. 



(20) 



hi = hi^i + Qi [y[i) - ^ X{i)afhi^i 



(21) 
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where 

9i = ./pi{i)X{ifP,.iai (22) 

lii) = (23) 

l + p\X{i)\^afP,^^a, 

Pi = P,_i-/5 7(i)|i(i)|2p,_ia,afPi_i (24) 

These recursions apply for all i and are initialized by 

M^^_^^ = 0, P_i = Rh, and h_i = 

Now, let R be the optimal value for the regularized objective function in (fT4l ). If the value R can be estimated, we 
can restrict the search of the blind MAP solution X to the offsprings of those partial sequences A'(j) that satisfy 
My, < R. This forms the basis for our exact blind algorithm described below. 

A. Exact Blind Algorithm 

In this subsection, we describe the algorithm used to find the MAP solution of the system. The algorithm employs 



the above set of iterations (EOt — (124]) to update the value of the cost function M-j^^ ^ which is then compared with 
the optimal value R. The input parameters for the algorithm are: the received channel output y, the initial search 
radius r, the modulation constellatioiu ^ and the IxA^ index vector /. 

The algorithm is described as follows (the algorithm is also described by the flowchart in Figure [l]) 

1) (Initialize) Set i = 1, = 1 and set X{i) = 

2) (Compare with bound) Compute and store the metric M-j^^^. If Af^^^ > r, go to 3; else, go to 4; 

3) (Backtrack) Find the largest 1 <j <i such that 

< If there exists such j, set i = j and go to 5; else go to 6. 

4) (Increment subcarrier) If i < set i = i + = 1, = and go to 2; else store current 
A'(^), update r = M.^^^^ and go to 3. 

^Examples of the modulation constellation are Q, are 4-QAM and 16-QAM. We use \Q\ to denote the constellation size and Q{k) for the 
fcth constellation point. For example, in 4-QAM \Q\ =4 and f2(l), • • ■ , f2(4) are the four constellation points of 4-QAM. The indicator 
refers to the last constellation point visited by our search algorithm at the ith subcarrier. 
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5) (Increment constellation) Set = + 1 and X{i) = Go to 2. 

6) (End/Restart) If a full-length sequence ^(^n) has been found in Step 4, output it as the MAP solution and 
terminate; otherwise, double r and go to 1. 

The essence of the algorithm is to eliminate any choice of the input that increments the objective function beyond 
the radius r. When such a case is confronted, the algorithm backtracks (Step 3 then Step 5) to the nearest subcarrier 
whose alphabet has not been exhausted (the nearest subcarrier will be the current subcarrier if its alphabet set is 
not exhausted). 

The other dimension the algorithm works on is properly sizing r; if r is too small such that we are not able to 
backtrack, the algorithm doubles r (Step 3 then Step 6). If on the other hand r is too large that we reach the last 
subcarrier too fast, the algorithm reduces r to the most recent value of the objective function, (r = Mx^^^^) and 
backtracks (Step 4 then Step 3). 




Fig. 1. Flowchart of the blind algorithm. 
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Remark 1: The backtracking algorithm depends heavily on calculating the cost function using (|20l)-(l24l). In the 
constant modulus case, the values of ^[^^(i)^ in equations ( [23] ) and (l24l) become constant (equal to p Ex) for all 
i, and the values of ^{i) and Pi become 

= ^^ r ^Hp (25) 

Pi = P^^l- p£xl{i)P^-la^afPi^l, (26) 

which are independent of the transmitted signal and thus can be calculated offline. 

Remark 2: The algorithm can also be used for a pilot-based standard. In this case, when the algorithm reaches a 
pilot holding-subcarrier, no backtracking is performed as the value of the data carrier is known perfectly. In the 
presence of pilots, it is wise to execute the algorithms over the pilot-holding subcarriers first and subsequently move 
to the data subcarriers. For equispaced comb-type pilots, (semi)-orthogonality of regressors is still guaranteed. 
Remark 3: Like all blind algorithms, we use one pilot bit to resolve the sign ambiguity (see references in Table ID. 



B. Determination of initial radius p, and r 

Our algorithm depends on p, and r which we need to determine. The receiver can easily estimate p by 
measuring the additive noise variance at its side. As for the channel covariance matrix R^, our simulations show 
that with carrier reordering we can replace R^ with identity with essentially no effect on performance. This becomes 
especially true in the high SNR regime. It remains to obtain an initial guess of the search radius r. To this emd, 
note that if h and X are perfectly known (with h drawn from J\f{0, Rh) but is known) then 

^ = \\h\\l-^ + \\y-VP diag{X) A^hf (27) 

is a chi-square random variable with k = 2{N + L + 1) degrees of freedorto Thus, the search radius should be 
chosen such that > r) < e, where > r) = 1 — F{r; k), and where F(r; k) is the cumulative distribution 



''The first term on the right hand side has 2(L + 1) degrees of freedom as h is Gaussian distributed wfiile the second term has 2N degrees 
of freedom as ^ — diag(Af) A^/i is just Gaussian noise. 
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function of the chi-square random variable given by 



Fir- k) = (28) 
^ '^^ T{k/2) ' ^ 



Here, r/2) is the lower incomplete gamma function defined as 



l-r/2 

7(A:/2, r/2) = / t^'"^'^ e'^ dt. (29) 





So, under this initial radius, we guarantee finding the MAP solution with probability at least 1 — e. In case a solution 
is not found, the algorithm doubles the value of r and starts over. This process continues until a solution is found. 
For example, when = 64, L = 15 and e = 0.01, the value of our radius should be set to 204. 

IV. An Approximate Blind Equalization Method 

There are two main sources that contribute to the complexity of the exact blind algorithm of Section |llll 

1) Calculating Pj.- the second step of the blind algorithm requires updating the metric M^^^ . This metric 
depends heavily on operations involving the (L + 1) x (L + 1) matrix Pi which are the most computationally 
expansive (see Table HI] which estimates the computational complexity of the RLS). 

2) Backtracking: When the condition M-^^ ^ < r is not satisfied, we need to backtrack and pursue another branch 
of the search tree. This represents a major source of complexity. 

In the following, we show how we can avoid calculating Pi all together. We postpone the issue of backtracking to 
Section jV] 

A. Avoiding Pi 

Note that in the RLS recursions (|20l)— (l24l). Pj always appears multiplied by a^. Let's see how this changes if we 
set P_i = I and assume that the a^'s are orthogonal or, in particular, if we assume that apaj+i = = 0. 

With these assumptions note that 

7(0) = . — = ^ (30) 

l + p |A'(0)PaHp_iao 1 + p |A'(0)|2(L + 1) 
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i.e., 7(0) is independent of Also note that 

Poai = P-iai- pj{0)\X{0)\^P_iaoa^P-iai 
= ai-/>7(0)|^(0)|2aoa^ai 

= ai. (31) 

For a similar reason 

Poa2 = a2. (32) 



From (I31I ). it is also easy to conclude that 

7(1) = ^ (33) 

1 + ^1-^(1)^(^ + 1) 

i.e., 7(1) is independent of Pq. Also, from ( [3T| ) and (|32l ) it follows that Pjaj+i = aj+i and Piai^2 = We 
now investigate what happens to Pj+i. 

Pi+iai+2 = Piai+2-P7(« + l)l'^'(^ + l)l^^iai+ia!+i-Piai+2 
= ai+2 - pj{i + + l)pai+ia^iai+2 

= ai+2- (34) 

Similarly, 

Pi+ifli+a = (35) 



So, by induction we see that each occurrence of PjOj in the recursion set (I20l)-(l23l) can be replaced with a^. This 
allows us to discard (l24l ). i.e., 

M^^^^ = M^^^_^^+ji{}\y{i)-^Xii)afh,_,\^ (36) 



hi = h,.^+gJy{i)-^Xii)afh,^, (37) 
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TABLE II 

Estimated computational cost per iteration of the RLS algorithm 



Term 


X 


+ 




/oX(i)a^h^ 1 


2L + 2 








1 
1 


1 

1 




P 


1 

i_ 




1 




1 
i 


1 

1 




I, 

hi 


r 1 


r 1 1 
iv + I 


1 
I 




r 2 1 r 1 1 






9i 


L + 3 








L + 1 


L 




7(i) 


3 


1 


1 


«fj'.-i 


L2 + 2L + 1 


L2 + L 




Pi 


L2 + 2L + 2 


L2 + 2L + 1 




Total per iteration 


+ IIL + 17 


2L^ + 5L + 4 


3 



where 



9i = ./p7ii)Xii)''a, (38) 

7(i) = ^ . (39) 

l + p\X{i)\^L + l) 

Thus, the approximate bhnd RLS algorithm is effectively running at LMS complexity. Table |ll] summarizes the 
computational complexity incurred in the RLS calculation. 



B. Avoiding Pi with Carrier Reordering 

The reduction in complexity above is based on two assumptions. The first assumption is to set P_i = / (instead 
of Rh) and the second is to assume that the consecutive a^'s are orthogonal. Note that the aj's are columns of 
A, i.e. they are partial FFT vectors. As such, strictly speaking, they are not orthogonal. Notice, however, that for 



U"-^(i-i')k) 



(40) 



A;=0 



which after straightforward manipulation can be shown to be 



\afai 



1 

L+l 



L + 

sin{Tr{i-i')j-) 



{Z = i') 
{i / Z') 



(41) 
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This is a function of (i — i') mod N . Thus, without loss of generality, we can set i' = 1 and plot this autocorrelation 
with respect to i. The autocorrelation decays with i as shown in Figure |2] We can use this observation in 




i 



Fig. 2. Autocorrelation vs i for = 64 and L — 15 

implementing our blind RLS algorithm. Specifically, note that the whole OFDM data is available to us and so we can 
visit the data subcarriers in any order we wish. The discussion above shows that the data subcarriers should be visited 
in the order i, i + A, i + 2 A, . . . where A should be chosen as large as possible to make a,, Oj+A, ttj+2A! • • • 
as orthogonal as possible, but small enough to avoid revisiting (or looping back to) a neighborhood too early. We 
found the choice A = to be a good compromise. From Figure [2l which plots (|4TI) for = 64 and L = 15, 
columns 1, 5, 9, 13, 17, 21, • • • ,61 are orthogonal to each other and so are the columns 2, 6, 10, 14, 18, • • • , 62. So, 
if the vectors are visited in the following order 1, 5, 9, 13, 17, 21, • • • , 61, 2, 6, 10, 14, 18, • • • , 62, • • • , then we have 
a consecutive set of vectors that are orthogonal. The only exception is in going from column 61 to 2. These two 
columns are not really orthogonal but are nearly orthogonal (the correlation of columns 1 and 61 is zero, so the 
correlation of 61 with 2 should be very small since the correlation function is continuous as shown in Figure H). 
In general, we chose A = and visit the columns in the order i + A,i + 2A, • • • ,i + LA, i = 1, - ■ ■ , A — 1. 

Our simulation results show that the BER we get with exact calculation of Pj and that obtained when we set 
P i = / with subcarrier reordering are almost the same. Table Hill gives the computational complexity incurred in 
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TABLE III 

Estimated computational cost per iteration of the RLS algorithm with Carrier Reordering 



Term 


X 


+ 






2L + 2 


L 




\yii)-^X{z)afh,^,\^ 


1 


1 




p-f{i) 


1 




1 




1 


1 




hi 


L + 2 


L + 1 


1 




3 


1 


1 


Total per iteration 


4L + 13 


2L + 4 


3 



the RLS calculation when subcarrier reordering is used (i.e., free from Pi calculation). 

Note that with subcarrier reordering, the new version of the RLS runs without the need to use the power delay 
profile statistics, which relieves us from the need to provide this information. 



V. Computational Complexity in the High SNR Regime 

In the section, we study the other source of complexity (backtracking) and show that there is almost no 
backtrackingj in the high SNR regime. To this end, consider the behavior of the algorithm when processing the 
ith subcarrier. There are \Q\ different alphabet possibilities to choose from at this subcarrier and a similar number 
of possibihties at the preceding i — 1 subcarriers, creating a total of — 1 incorrect sequences and one 
correct sequence ^[i)- The best case scenario is to have only one sequence that satisfies M^p^.^ < r in which case 
there would be only one node to visit. The worst case is having to visit the remaining — 1 wrong nodes before 
reaching the true sequence (visiting of nodes will happen through backtracking); this latter case is equivalent to the 
exhaustive search scenario. Thus, if we let Cj denote the expected number of nodes visited at the zth subcarrier, 
then from above we can write 

C, < 1 + {\n\' - l)Pi (42) 



'The term "backtracking" refers to the case when the algorithm is currently at subcarrier i and it has to change the estimate of the 
data symbol at some subcarrier j < i. On the other hand, sweeping the constellation points at subcarrier to find the first one that satisfies 
^Xf^i-i < r is not considered backtracking. 
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where Pi is the maximum probabiUty that an erroneous sequence of symbols A'(j) 7^ A'(j) has a cost less than r. 
We will show that this probability becomes negligibly small at high SNR values. Recall that 



3^(,) = ^diag(A'(,))Af/i + A/', 



(43) 



where A/'(j) denotes the first i symbols of Af. Note the (l43l) can be written as 



Vpdiag(Af(i))AH / 



h 



(44) 



We first prove our claim for the LS cost and then show how the MAP cost reduces to LS cost for high SNR. 



A. LS cost 



The least squares cost is given by (see 1381, Chapter 11, pp. 663) 



^;e<„ = 3^;i)(/-Vpdiag(Ar(,))A?(VpAidiag(A',)^Vpdiag(A',)Af) ' ^A,diag(Arf )) 3^(,) 
= yf.^ (l-p diag(A',)Af {p Ai|diag(A'i)|2Af)"' Aidiag(Arf 



^^(. = yl)[i-D)yi,C) (45) 

where 

D = diagCA-^Af (A,|diag(Ar,)|'Af Aidiag(Arf ). (46) 
So the probability that the sequence A'^j^ satisfies -^^f^ ^ < t reads 

= Pr(M;e^^^ < r) 

Pr = V,(ylJl-D)y^,)<r\ (47) 
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In the strict sense of the word, backtracking means visiting Step 3 in our algorithm. Substituting (1441 ) in ( |47l) yields 



Pi = Pr 



/ 


/ 




H 






\ 






h 




h 


















< r 


V 


I 


















J 


/ 



where 



G 



(0 



I 



\I-D] 



^diag(A'(,))AH / 



Let B = diag(A'(j)) , then Gu) can be written as 



G 



pB^[I -D]B B^ [I -D]I 
I\I -D]B I\I -D]I 



which in compact form can be expressed as 



pE E2 



77'H TP 

-C'2 -^3 



Using the Chernoff bound the right hand side of (1481) can be bounded in the following way 









h 


H 


h 


\ 




Pi < e'^^'E 


exp 


















V 






















1 



Noting that 



h 



AA(0,Si) 



with 



Rh 
li 



(48) 



(49) 



(50) 



(51) 



(52) 



(53) 



(54) 



19 



we can solve the expression in ( [52b as 



exp 
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\ 
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h 












exp 










V 












V 
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Q-^^r^{L+^+l) 

r 1 ' \ 





exp 



ft. 



/xr~(L+j+l) 



e f^' vr 



(55) 



Note that the numerator in ( 1551 ) is a multi-variate complex Gaussian integral. Recall that an n-dimensional complex 
Gaussian integral has the solution (see |fT9l ) 



exp 



\W 



dx 



vr 



det{W) 



(56) 



This allows us to simplify (1551) as 



Pi < 



(57) 



det(5]i + ^G(i)) 

Next, we show that the probability — )• as p — )• oo. To show this, we just need to show that the largest eigenvalue 
of the term in the denominator goes to infinity as p — )■ oo. 

Lemma 1: Let E = Ajdiag(A'j ) [/ — D]diag{Xi)A^ be a (L + 1) x (L + 1) matrix, then for any sequence Xi, 
E has a positive maximum eigenvalue, Amax and a corresponding unit-norm eigenvector v of size (L + 1) x 1. 
Proof: Recall that 



D = diagiX i)Af Aidiag{xf)diagiXi)Af Aidiag(Arf ) 



(58) 



and let F = dmg{X i)AY, then we can write the above equation as 



d = f(f^f) ^ f^ = ff'^ 



(59) 
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where F'^ = [F^F) ^ F^ is the Moore-Penrose pseudo-inverse^ (see HTI . Chapter 5, pp. 422). Therefore, D is 
an idempotent matrix with eigenvalues equal to either or 1 POl and hence, [/ — D] is also a positive semi-definite 
idempotent matrix. Note also that the matrix E in dSTT ) can be written as 



E 



Aidia.g{Xi )[I - D]dia.g{Xi)Af 



B'^ll -D\B 



(60) 



and 



z"£;z = z"s"[/ - D]Bz = (-Bz)"[/ - D][B'i) > 



(61) 



and so E is Hermitian and positive semi-definite. 

Let ?7 = [ui U2 • • • ul_|_i] be a (L + 1) X (L + 1) unitary matrix where Uj is the ith eigenvector, then, 
E = UAU^ where A is a diagonal matrix containing ordered eigenvalues of E such that Ai > A2 > • • • > A^+i. 
Let z = ?7^v, then the maximum eigenvalue of E is given as 



TT 

max V Ew 

M|2 = l 



max z^Az 

INI|2 = 1 



L+l 



max 
ll^lh 



< max 



Ai|zi|^ 

i=l 
L+l 

ax Ai \zi\ 



i=l 

^ Ai — Amax 



(62) 
(63) 

(64) 
(65) 



The equality is attained when v is the eigenvector of Amax- ■ 
Lemma 2: Given that E has a positive maximum eigenvalue Amax with corresponding unit-norm vector v of 
size (L + 1) X 1, then the maximum eigenvalue of in (l50l ) is lower bounded by w^G(j)W = p Amax where 



w 



V(L+l)xl 
Oixl 



(66) 



'the columns of F are linearly independent. 
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Proof: From Lemma 1, the largest eigenvalue of E is Amax- It follows that the largest eigenvalue of pE is 
pAmax- Let A'j^ax be the largest eigenvalue of From dST] ). we can see that pE is a principal sub-matrix of 
(see EH, Chapter 7, pp. 494) and thus 

-^max — P^max 

(67) 

1. e., the largest eigenvalue of the principal sub-matrix pE is smaller than or equal to the largest eigenvalue of Gi-j) 
(see Hn . Chapter 7, pp. 551-552). Thus pAmax is a lower bound on the largest eigenvalue of G^^y ■ 

Note that is positive definite as it is a covariance matrix, hence it will have positive eigenvalues. From Lemma 

2, the maximum eigenvalue of Aj^^x — > oo as p — )• cxd. Thus the denominator in dSVl ) grows to infinity in the 
limit p — oo and 

lim Pi ^ (68) 

From (|42l) and (|68] ). we have 

lim Ci < 1 + - 1) lim Pi (69) 

p— >-oo p— >oo 

lim Cj < 1 (70) 

p— >oo 



B. MAP co5f 

The cost associated with the MAP solution of an erroneous sequence of symbols A'(j) ^ A'(j) is given as (see 
Il38l . Chapter 11, pp. 672) 

= {' + P dms{X^)AfRhA,dmg{xf)'^ ~^y^,) (71) 

Mathematically, 

Pi = Pr(M;e^^, < r) 

Pi = Fr(yf^(^I + pdmg{X,)AfRhA,dmg{xf)y'y^,)<r\ (72) 
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By matrix inversion lemma 



/ + diag{X,)AfRhAidmgiXl 



I-p diagiXi)Af + p Aidiag(A'f )diag(A',)Af J A^diagC^- 
/ - diag(A',) Af [i R-^ + Aidiag(Arf )diag(A'i) Af ] A,diag(A'f 
I-D 



(73) 
(74) 
(75) 



where 



D = diag(Ari)Aj 



- R^^ + Aidiag(A'f )diag(A'i)Af 
P 



AidiagCA-f) 



(76) 



Thus (1721) can be written as 



(77) 



note that dTT] ) is of the same form as ( |47] ). The only difference in the LS and MAP costs is the presence of the 
term ^ /?^^ in (ItHI ). Also note that this term depends on the inverse of the SNR. For low SNR, the inverse term in 
(1761 ) is always invertible due to the regularization term. At high SNR, the effect of regularization fades and inverse 
term in (1761 ) is invertible. At high SNR, i.e., p — t- oo, - R^^ ^ and D of (|75]) takes the same form as that of 



LS cost leading to (iTOl) . 



Table |IV] lists the estimated computational cost for our blind algorithm in the high SNR regime. Since there is 
no backtracking, the total number of iterations is N, which explains our calculations in Table JV] It thus follows 
that the total number of operations needed for our algorithm is of the order 0{LN) in high SNR regime. The pilot 
based approach for channel estimation needs to invert an (L + 1) x (L + 1) matrix (assuming we need L + 1 pilots 
to estimate a channel of length L + 1) with a complexity of the order O(L^). Since the cyclic prefix is a fixed 
fraction of the OFDM symbol (L = N/m with m typically set to m = 4 or 8) we see that the complexity of the 
two approaches become comparable in the high SNR regime. 
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TABLE IV 

Total computational cost of the ML blind and training based algorithms at high SNR 



Algorithm 


X 


+ 


Blind Algorithm 


(3L^ + 11L + 17)iV 


(2L^ + 5L + 4)A^ 


Blind Alg. with Carrier Reordering 


(4L + 13)A^ 


{2L + A)N 


Training Based Algorithm L3_9J 


4L^ + 17L + 13 


2L^ + 6L + 4 



VI. Simulation Results 

We consider an OFDM system with = 16, or 64 subcarriers and a CP of length L = ^. The uncoded data 
symbols are modulated using BPSK, 4-QAM, or 16-QAM. The constructed OFDM signal then passes through a 
channel of length L + I, which is assumed to be block fading (i.e., constant over one OFDM symbol but fades 



independently from one symbol to another) and whose taps follow an exponential decay profile 



-Q.2t 



)■ 



A. Bench marking 

We compare the performance of our algorithm against the following receivers 

1) the subspace-baseqj blind receiver of ifTOll . 

2) the sphere decoding based receiver of [281, 

3) a receiver that acquires the channel through training with L + 1 pilots and a priori channel correlation Rh 



4) the ML receiver that acquires data through exhaustive search. 
The simulations are averaged over 500 Monte-Carlo runs. 

Figure [3] compares the BER performance of our algorithm with the aforementioned algorithms for an OFDM 
system with A^ = 16 subcarriers and BPSK data symbols. Note in particular that our blind algorithm outperforms 
both the subspace and sphere decoding algorithms and almost matches the performance of the exhaustive search 
algorithm for low and high SNR, which confirms the ML nature of the algorithm. 

Figure |4l which considers the 4-QAM case, shows the same trends observed for the BPSK case of Figure [3] 

Figure [5] considers a more realistic OFDM symbol length (A^ = 64), drawn from a 4-QAM constellation and 
allows the SNR to grow to 45 dB. Our blind algorithm shows no error floor signs, which is characteristic of non-ML 

^The block fading assumption is maintained for all simulations. However, for the subspace blind receiver of 1 10] to work, the channel 
needs to stay constant over a sequence of OFDM symbols. For this particular receiver, the channel was kept fixed over 50 OFDM symbols. 
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SNR(dB) 

Fig. 4. BER vs SNR for 4-QAM OFDM over a Rayleigh channel with TV = 16 and L = 3 



methods. Furthermore, the algorithm beats the training-based method and follows closely the performance of the 
perfect channel case. Figure [6] shows the results with = 64 subcarriers and 16-QAM data symbols for SNR as 
large as 50 dB. Again, the proposed blind algorithm does not reach an error floor. 
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SNR(clB) 

Fig. 5. BER vs SNR for 4-QAM OFDM over a Rayleigh channel with iV = 64 and i = 15 




SNR(dB) 

Fig. 6. BER vs SNR for 16-QAM OFDM over a Rayleigh channel with iV = 64 and L = 15 



B. Low-Complexity Variations 



In this subsection, we investigate the low-complexity variants of our algorithm. Specifically, we consider the 
performance of the blind algorithm with 
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1) Pi set to /, 

2) Pi set to / with subcarrier reordering 

Figure |7] exhibits the comparisons for the various algorithms for BPSK and N = 16. Note that with Pi set to / 
arbitrarily, the performance of the blind algorithm deteriorates and the BER reaches an error floor. Contrast this 
with the algorithm variant that uses subcarrier reordering as well, and note that the performance of this variant 
follows closely the performance of the exact blind algorithm. Also note that the BER of both of these algorithms 
beats that of the sphere decoding algorithm of |[28l . The same trends are observed in Figure |8j which considers the 
4-QAM case. 
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Fig. 7. Comparison of low-complexity algorithms for BPSK OFDM with = 16 and L = 3 



Figure |9] compares the average runtime of various algorithms as a function of the SNR. Note first that the extreme 
cases are the training-based receiver and the exhaustive search receiver, both of which are independent of the SNR. 
The runtime of the proposed algorithm decreases with the SNR and is sandwiched in-between the run time of the 
sphere decoding algorithm and that of the subspace algorithm for all values of the SNEq Note that in the high 
SNR regime our algorithm runs at the same speed as the subspace algorithm. 

^The runtime of the subspace algorithm is adjusted to account for the fact that it requires the channel to be constant over a block of L + 1 
OFDM symbols. 
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D Blind Algo. with subcarrier reordering 
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Fig. 8. Comparison of low-complexity algorithms for 4-QAM OFDM with 16 and L = 3 

Figure [10] shows the average runtime of the proposed algorithm with = 16 for various modulation schemes 
(BPSK, 4-QAM and 16-QAM). It is clear from the figure that the average runtime decreases considerably at higher 
SNR values. 
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Fig. 9. Average time comparison for BPSK data symbols with A^ = 16 and L = 3 



28 



10^ 




10"! ' ' ' ' ' ' ' 

5 10 15 20 25 30 35 
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Fig. 10. Average Time Comparison for our Blind Algorithm for Different Modulation with iV = 16 and L = 3 

VII. Conclusion 

In this paper, we have proposed a low-complexity blind algorithm that is able to deal with channels that change 
on a symbol by symbol basis allowing it to deal with fast block fading channels. The algorithm works for general 
constellations and is able to recover the data from output observations only. Simulation results demonstrate the 
favorable performance of the algorithm for general constellations and show that its performance matches the 
performance of the exhaustive search for small values of N. 

We have also proposed an approximate blind equalization method (avoiding Pi with subcarrier reordering) to 
reduce the computational complexity. As evident from the simulation results, this approximate method performs 
quite close to the exact blind algorithm and can work properly without a priori knowledge of the channel statistics. 
Finally, we study the complexity of our blind algorithm and show that it becomes especially low in the high SNR 
regime. 
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Abstract — This paper proposes a low-complexity algorithm for 
blind equalization of data in OFDM-based wireless systems with 
general constellation. The proposed algorithm is able to recover 
data even when the channel changes on a symbol-by-symbol 
basis, making it suitable for fast fading channels. The proposed 
algorithm does not require any statistical information of the 
channel and thus does not suffer from latency normally associated 
with blind methods. We also demonstrate how to reduce the 
complexity of the algorithm, which becomes especially low at 
high SNR. Specifically, we show that in the high SNR regime, the 
number of operations is of the order 0{LN), where L is the cyclic 
prefix length and A'^ is the total number of subcarriers. Simulation 
results confirm the favorable performance of our algorithm. 

Index Terms — OFDM, channel estimation, maximum- 
likelihood detection, maximum a posteriori detection and 
recursive least squares. 



I. Introduction 

Modern wireless communication systems are expected to 
meet an ever increasing demand for high data rates. A major 
hindrance for such high data rate systems is multipath fading. 
Orthogonal frequency division multiplexing (OFDM), owing 
to its robustness to multipath fading, has been incorporated 
in many existing standards (e.g., IEEE 802.11, IEEE 802.16, 
DAB, DVB, HyperLAN, ADSL etc.) and is also a candidate 
for future wireless standards (e.g., IEEE 802.20). All current 
standards use pilot symbols to obtain channel state information 
(needed to perform coherent data detection). This reduces 
the bandwidth available for data transmission, e.g., the IEEE 
802. lln standard uses 4 subcarriers for pilots, that is 7.1% 
of the available bandwidth, of the 56 subcarriers available for 
transmission. Blind equalization methods are advantageous as 
they do not require regular training/pilots symbols, thus freeing 
up valuable bandwidth. 

Several works exist in literature on blind channel estimation 
and equalization. A brief classification of these works based 
on a few commonly used constraints/assumptions is given in 
Table|I](note that this list is not exhaustive). Broadly speaking, 
the literature on blind channel estimation can be classified into 
maximum-likelihood (ML) methods and non-ML methods. 

The non-ML methods include approaches based on sub- 
space techniques ifTI- llTOI . second-order statistics ITTI . lfT2l . 
ifTsl . Cholesky factorization llT4l . iterative methods ifTsl . vir- 
tual carriers lfT6i real signal characteristics ifTTl and linear 



precoding lO, |[T8l. Subspace-based methods IT\-^, Q- 
flOl generally have lower complexity but suffer from slow 
convergence as they require many OFDM symbols to get an 
accurate estimate of the channel autocorrelation matrix. Blind 
methods based on second-order statistics ifTTI . lfT2ll . lfT3 l also 
require the channel to be strictly stationary over several OFDM 
blocks. More often than not, this condition is not fulfilled 
in wireless scenarios (e.g., as in WLAN and fixed wireless 
applications). Methods based on Cholesky's factorization llT4ll 
and iterative techniques ifTSl suffer from high computational 
complexity. 

Several ML-based bUnd methods have been proposed in lit- 
erature EOl, mi, lED-lISa, ES. Although they incur a higher 
computational cost, their superior performance and faster 
convergence is very attractive. These characteristics make this 
class of algorithms suitable for block fading scenarios with 
short channel coherence time. Usually, suboptimal approxima- 
tions are used to reduce the computational complexity of ML- 
based methods. Though these methods reduce the complexity 
of the exhaustive ML search, they still incur a significantly 
high computational cost. Some methods like 1211 . ||23]| . ll24l 
are sensitive to initialization parameters, while others work 
only for specific constellations (see Table A few ML- 
based algorithms allow the channel to change on a symbol- 
by-symbol basis (e.g., If26l . If37l ). however, these algorithms 
are only able to deal with constant modulus constellations. 

To the best of our knowledge no blind algorithm in literature 
is able to deal with channels that change from one OFDM 
symbol to another when the data symbols are drawn from 
a general constellation. Contrast this with the equalization 
algorithm presented in this paper. The key features of the blind 
equalization algorithm presented in this paper are that it 

1) works with an arbitrary constellation, 

2) can deal with channels that change for one symbol to 
the next, 

3) does not assume any statistical information about the 
channel. 

In addition, we propose a low-complexity implementation of 
the algorithm by utilizing the special structure of partial EFT 
matrices and prove that the complexity becomes especially low 
in the high SNR regime. 

The paper is organized as follows. Section Ull describes the 
system model and Section |III] describes the blind equalization 
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TABLE I 
Literature Classification 



Constraint 


Limited by 


Not limited by 


Channel constant over 
M symbols, Af > 1 


ri"i i^"i 111 ici I/""! 1^1 ir^i iir\i 

|2|, |3|, |5|, |6|, |7|, |9|, |10|, 
UJJ, |12|, |13|, |14|, |15|, |16|, 
El, ESI, |20|, 121 1, |24|, [271, 


(26), (371 


Uses pilots 
to resolve 
phase ambiguity 


in, |5|, 191, ii()i, mi, 

fr41, |15|, |16|, |18|, 1201, 
1211, |28|, |25|, |26|, |3T| 


(36) 


Constant modulus constellation 


[21, |3|, |6|, 191, 1 121, 1 131, 
L15J, [16J, [201, |21J, 

01, Ei, (27), oi, (13 


(D, d), (D, (ID, na 

(n), (19), (28) 



algorithm. Section |IV] presents an approximate method to 
reduce the computational complexity of the algorithm, while 
Section |V] evaluates this complexity in the high SNR regime. 
Section |VI] presents the simulation results and Section IVIII 
gives the concluding remarks. 

A. Notation 

We denote scalars with small-case letters, x, vectors with 
small-case boldface letters, x, while the individual entries of a 
vector h are denoted by h{l). Upper case boldface letters, X, 
represent matrices while calligraphic notation, X, is reserved 
for vectors in the frequency domain. A hat over a variable 
indicates an estimate of the variable, e.g., h is an estimate 
of h. {.y- and (.)^ denote the transpose and Hermitian 
operations, while the notation stands for element-by-element 
multiplication. The discrete Fourier transform (DFT) matrix 
is denoted by Q and defined as qi^k — e^-'T^^'"^)'^'^^^' with 
k,l = 1,2, ■ ■ ■ , N (N is the number of subcarriers in the 
OFDM symbol), while the invrse DFT (IDFT) is denoted as 
Q^. The notation ||a||B represents the weighted norm defined 
as IIqIIb = a^Ba for some vector a and matrix B. 

II. System Model 

Consider an OFDM system where all the A'' available 
subcarriers are modulated by data symbols chosen from an 
arbitrary constellation. The frequency-domain OFDM symbol 
X, of size X 1, undergoes an IDFT operation to produce 
the time-domain symbol x, i.e. 



respectively, i.e. 



NQ^X. 



(1) 



The transmitter then appends a length L cyclic prefix (CP) to x 
and transmits it over the channel. The channel h, of maximum 
length i+1 < A^, is assumed to be constant for the duration of 
a single OFDM symbol, but could change from one symbol to 
the next. The received signal is a convolution of the transmitted 
signal with the channel observed in additive white circularly 
symmetric Gaussian noise n ^ Af{0, 1). The CP converts the 
linear convolution relationship to circular convolution, which, 
in the frequency domain, reduces to an element-by-element 
operation. Discarding the CP, the frequency-domain received 
symbol is given by 



-AT, 



(2) 



y = ^nQX 

where p is the signal to noise ratio (SNR) and y, H, X ,M , 
are the iV-point DFT's of y, h, x, and additive noise n 



n = Q 

1 



h 




X = -j=Qx, 



Qn, and y = -j=Qy- 

N VN 



(3) 



Note that h is zero padded before taking its iV-point DFT. Let 
consist of first L + 1 columns of Q (i.e., A consist of 
first L + 1 rows of Q"), then 



n = A^h and h = AH. 
This allows us to rewrite Q as 

y ^ y/^ dmg{X)A^h + Af. 



(4) 



(5) 



III. Blind Equalization Approach 

Consider the input/output equation (|5]l, which in its element 
by element form reads 

y{j) = VP^ijKh+Af{j) (6) 

where aj is the jth column of A. The problem of joint ML 
channel estimation and data detection for OFDM channels can 
be cast as the following minimization problem 



Jml — 



^ min^ \\y - VP diag{X)A^h\ 



N 



i—1 



N 

E \y{j)-Vpx{j)afh\ 



(7) 



where il^ denotes the set of all possible A^— dimensional 
signal vectors. Let us consider a partial data sequence Af(i) 
up to the time index i, i.eQ 

and define Mx^^^ as the corresponding cost function, i.e. 

M;t(., = min \\y^,) - ^ dia.g{X ^,))Al^hf , (8) 



'Thus, for example -^(2) 
,X{N)]'^ = X. 



[A'(l), A'(2)]T and X 



(JV) 



3 



where A^-j consists of the first i rows of A}^. 

In the following, we pursue an idea for blind equalization 
of single-input multiple-output systems first inspired by |19|. 
Let R be the optimal value for the objective function (|7]i (we 
show how to determine R in Section IIII-BI further ahead). If 
M;f(i) > Ri then X{^i) can not be the first i symbols of the 

-ML -ML - ML 

ML solution X to (|7J). To prove this, let X and h 
denote the ML estimates and suppose that our estimate A'(j) 
satisfies 

- ML 

-^(0 = X^,) (9) 

i.e. the estimate Ar(j) matches the first i elements of the ML 
estimate. Then, we can write 

R = min \\y - Jpd\&g{X)A^hf 
= \\y^.)-^diagix'^))Af,^h^''\\' 

N 

+ E |y(.?)-^^5A'^'^.?>f/^ 



ML, 



\y{^) - \/pdiag(Ar(,))A[^)/i 



-ML, 



N 



- - ML , 



+ E \y{j)^vpx'^''ij)afh "|^ (10) 

where the last equation follows from (|9]l. Now, clearly 



ML, 



||y(,)-Vpdiag(Ar(,))A[^)/i 

> mm\\y^,)-^diag{X^,))Al^hf (11) 
= ||y(.)-Vpdiag(Ar(,))A[^)/i||2, (12) 

where h is the argument that minimizes the RHS of ( fTTT i. Then 



R = ||3^(,o-Vpdiag(A'(,))A[^)^ 



-ML, 



N 



- ML, 



+ E \yu)-v~px{j)afh 
j=i+i 

> mm ||3^(,) - diag(Af(,))A[^)/if 



(13) 



chosen by least-squares to make Mv> in (fTSl l equal to zercH 



Then, we will need at least L + 1 pilots defying the blind 
nature of our algorithm. Alternatively, our search tree should 
be at least L + 1 deep before we can obtain a nontrivial (i.e. 
nonzero) value for M 



X, 



(0 



An alternative strategy would be to find h using weighted 
regularized least squares. Specifically, instead of minimizing 
the objective function J]\il in equation (|7]i, we minimize the 
maximum a posteriori (MAP) objective function 



J MAP = min 



{\Ml-^ + \\y-VP diag(A')AH/i||2| (14) 



where Rh is the autocorrelation matrix of h (in Section 
IIVI we modify the blind algorithm to avoid the need for 
channel statistics). Now the objective function in ( fT4b can be 
decomposed as 



Jmap - min < + E I3^(j) - Vp Xij)afh\' 



,H. |2 



N 



+ E \yij)-^pXij)afh\' 
Given an estimate of Xfi_i), the cost function reads 



(15) 



X(i-i) 



nin|||/i||^_i + ~ Vpdiag(A'(,„i))A[^_i)/i||2](16) 



with the optimum value (see |[38ll . Chapter 12, pp. 671) 

H 



So, for A^(j^to correspond to the first i symbols of the ML 

solution Xi^^, we should have < R- Note that the an additional regressor X{i)af, i.e. 



h = ^/?ftA(,_i)diag(Af 

[7 + pdiag(Ar(,,_i))A[J_i)/?hA(,,_i)diag(Ar|^_i))]-iy(,_i) 

(17) 

and corresponding minimum cost (MMSE error) 

mmse=[/?,;^+pA(,„i)diag(i'(,_i))"diag(Ar(,_i))A[^_i)]"i 

(18) 

If we have a guess of X(i), we can update the cost function 
and obtain My . In fact, the cost function My is the same 
as that of M^, with the additional observation y(i) and 



'X, 



above represents a necessary condition only. Thus if is 
such that M-j^ < R, then this does not necessarily mean that 

- ML 

Afj-j) coincides with X^^-^ . 

This suggests the following method for blind equalization. 
At each subcarrier frequency i, make a guess of the new value 
of X(i) and use that along with previous estimated values 
X{1), ...,X{i — 1) to construct X(iy Estimate h so as to 
minimize M^ in ( fTsT i and calculate the resulting minimum 
value of My . If the value of My < R, then proceed to 
i + 1. Otherwise, backtrack in some manner and change the 
guess of X{j) for some j < i. A problem with this approach is 
that for i < L + 1, given any choice of X{i), h can always be 



My 



y{i-i) 
y{^) 



mm 

h 



i^{\\h\\l-, 









1) 


h 





K19) 



We can thus, recursively update the value M-^^^ based on 
M-y using recursive least squares (RLS) ['38|, i.e. 



My. . = My 



+ lii)\yii)-^Xii)afh,^i\^ (20) 



-Since ^^Jj is full rank for i < L + 1, diag(A'(jj)A^j is full rank too for 
each choice of diag(Af(j)) and so we will always find some h that will make 
the objective function in U3t zero (since h has L + 1 degrees of freedom). 
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where 



9^ 

7(i) 



1 



(21) 

(22) 
(23) 



P, = P,;_l-p7(^)|A'(^)|2p,_la,afP,_l (24) 

These recursions apply for all i and are initialized by 

My =0, P-i = Rh, and = 

Now, let i? be the optimal value for the regularized objective 
function in ( fT4] i. If the value R can be estimated, we can re- 
strict the search of the blind MAP solution X to the offsprings 
of those partial sequences X^^^-j that satisfy M^^^ < B. This 
forms the basis for our exact blind algorithm described below. 

A. Exact Blind Algorithm 

In this subsection, we describe the algorithm used to find 
the MAP solution of the system. The algorithm employs the 
above set of iterations (|20]|— (l24b to update the value of the cost 
function M.^^ ^ which is then compared with the optimal value 
R. The input parameters for the algorithm are: the received 
channel output y, the initial search radius r, the modulation 
constellatiorO and the 1 x index vector /. 

The algorithm is described as follows (the algorithm is also 
described by the flowchart in Figure [U 

1) (Initialize) Set i = 1, = 1 and set X{i) = 

2) (Compare with bound) Compute and store the metric 

. If > r, go to 3; else, go to 4; 

3) (Backtrack) Spind the largest 1 <j <i such that 
I{j) < |f2|. If there exists such j, set i = j and go to 
5; else go to 6. 

4) (Increment subcarrier) Ifi < A^ set i = i + = 1, 
X{i) ~ and go to 2; else store current X/^^), 



update r = Mp and go to 3. 



1 and 



5) (Increment constellation) Set 

X{i) = Go to 2. 

6) (End/Restart) If a full-length sequence <^(Ar) has been 
found in Step 4, output it as the MAP solution and 
terminate; otherwise, double r and go to 1. 

The essence of the algorithm is to eliminate any choice 
of the input that increments the objective function beyond 
the radius r. When such a case is confronted, the algorithm 
backtracks (Step 3 then Step 5) to the nearest subcarrier whose 
alphabet has not been exhausted (the nearest subcarrier will 
be the current subcarrier if its alphabet set is not exhausted). 

The other dimension the algorithm works on is properly 
sizing r; if r is too small such that we are not able to backtrack, 
the algorithm doubles r (Step 3 then Step 6). If on the other 
hand r is too large that we reach the last subcarrier too fast, the 

^Examples of the modulation constellation are f7 are 4-QAM and 16- 
QAM. We use to denote the constellation size and n(fc) for the fcth 
constellation point. For example, in 4-QAM |S1| = 4 and fi(l),--- ,fi(4) 
are the four constellation points of 4-QAM. The indicator refers to the 
last constellation point visited by our search algorithm at the ith subcarrier. 



algorithm reduces r to the most recent value of the objective 
function, (r = Mx^^,-) and backti-acks (Step 4 then Step 3). 

Remark 1: The backtracking algorithm depends heavily on 
calculating the cost function using (I20]l- (l24l i. In the constant 
modulus case, the values of in equations ( l23l l and 

(l24l i become constant (equal to p £x) for all i, and the values 
of 7(i) and Pi become 

7(*) = , ^ ^ ^Hp (25) 

P, = P,^,- p£x-f{^)P^^la,afP,^l, (26) 

which are independent of the transmitted signal and thus can 
be calculated offline. 

Remark 2: The algorithm can also be used for a pilot-based 
standard. In this case, when the algorithm reaches a pilot 
holding-subcarrier, no backtracking is performed as the value 
of the data carrier is known perfectly. In the presence of pilots, 
it is wise to execute the algorithms over the pilot-holding 
subcarriers first and subsequently move to the data subcar- 
riers. For equispaced comb-type pilots, (semi)-orthogonality 
of regressors is still guaranteed. 

Remark 3: Like all blind algorithms, we use one pilot bit to 
resolve the sign ambiguity (see references in Table |l]i. 

B. Detennination of initial radius p, Ry^ and r 

Our algorithm depends on p, Rh and r which we need to 
determine. The receiver can easily estimate p by measuring 
the additive noise variance at its side. As for the channel 
covariance matrix Rh, our simulations show that with carrier 
reordering we can replace Rh with identity with essentially 
no effect on performance. This becomes especially true in the 
high SNR regime. It remains to obtain an initial guess of the 
search radius r. To this emd, note that if h and X are perfectly 
known (with h drawn from A/^(0, Rh) but is known) then 



\h\ 



\y-^pdiag{X)A'^h\\ 



(27) 



is a chi-square random variable with k = 2{N + L + 1) degrees 
of freedorr0- Thus, the search radius should be chosen such 
that > r) < €, where > r) = 1 - F{r: k), and 
where F{r; k) is the cumulative distribution function of the 
chi-square random variable given by 

7(fc/2, r/2) 



F{r- k) 



(28) 



r(fc/2) ' 

Here, 7(fc/2, r/2) is the lower incomplete gamma function 
defined as 



7(^2, r/2) 



■/2 



2-1 



dt. 



(29) 



So, under this initial radius, we guarantee finding the MAP 
solution with probability at least 1 — e. In case a solution is 
not found, the algorithm doubles the value of r and starts over 
This process continues until a solution is found. For example, 

^The first term on the right hand side has 2{L + 1) degrees of freedom as 
h is Gaussian distributed while the second term has 2N degrees of freedom 
as y — -y/p diag(Af) A^h. is just Gaussian noise. 
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Fig. 1. Flowchart of the bUnd algorithm. 



when N ~ 64, L = 15 and e — 0.01, the value of our radius 
should be set to 204. 

IV. An Approximate Blind Equalization Method 

There are two main sources that contribute to the complexity 
of the exact blind algorithm of Section [Illl 

1) Calculating Pi: the second step of the blind algorithm 
requires updating the metric Af^^^ . This metric depends 
heavily on operations involving the {L + 1) x (i + 1) 
matrix Pi which are the most computationally expansive 
(see Table which estimates the computational com- 
plexity of the RLS). 

2) Backtracking: When the condition M-^ < r is not 
satisfied, we need to backtrack and pursue another 
branch of the search tree. This represents a major source 
of complexity. 

In the following, we show how we can avoid calculating Pi 
all together. We postpone the issue of backtracking to Section 

m 

A. Avoiding Pi 

Note that in the RLS recursions (|20]i— (|24]|. Pi always 
appears multiplied by a^. Let's see how this changes if we 
set P i = I and assume that the a/s are orthogonal or, in 



particular, if we assume that af^ai+i — — 0. With 

these assumptions note that 

7(0) ^ ^ 



1 + p |A'(0)|2aHp_iao 1 + p |A'(0)|2(i + 1) 

(30) 

i.e., 7(0) is independent of P i. Also note that 

Poai = P_iai-p7(0)|A'(0)|2p_iaoa?P-iai 
= ai- pj{0)\X{0)faoa^ai 
= ai. (31) 

For a similar reason 

Poa2 = 02. (32) 
From dSTT i. it is also easy to conclude that 



7(1) = 



1 



(33) 



1+^1-^(1)12(^+1) 

i.e., 7(1) is independent of Pq. Also, from (|3TI ) and (l32b it 
follows that PiUi^i = a,:+i and Piai+2 = 0-1+2- We now 
investigate what happens to Pi+i- 

Pi+lO'i+2 

= Pia,+2 - pj{i + + l)f Piai+iaf_^iPia,+2 

= ai+2 - Pl{i + l)l'^'(* + l)pai+ia^ia.i+2 

= a^+2■ (34) 
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TABLE II 

Estimated computational cost per iteration of the RLS 
algorithm 



Term 


X 


+ 




\y(i) - ^ X(i)afh,,i\^ 
P7(«) 

hi 

9i 

afPi-iai 
7(0 

P^ 


2L + 2 
1 
1 
1 

L + 2 
L2 + 2L + 1 
L + 3 
L + l 
3 

+ 2L + 1 
+ 2L + 2 


L 
1 

1 

L+l 
L'^ + L 

L 
1 

+ L 
+ 2L + 1 


1 
1 

1 


Total per iteration 


3L^ + IIL + 17 


2L^ +5L + 4 


3 



Similarly, 



(35) 



So, by induction we see that each occurrence of P^a^ in the 
recursion set (|20]|-(|23T| can be replaced with a^. This allows 
us to discard ( |24] |. i.e.. 



h, = (3^W- VpA'(i)af/i,_i), (37) 



where 



7(i) 



1 

1 



1) 



(38) 
(39) 



Thus, the approximate blind RLS algorithm is effectively 
running at LMS complexity. Table summarizes the com- 
putational complexity incurred in the RLS calculation. 

B. Avoiding Pi with Carrier Reordering 

The reduction in complexity above is based on two assump- 
tions. The first assumption is to set P„i — I (instead of 
Rh) and the second is to assume that the consecutive a^'s are 
orthogonal. Note that the Oj's are columns of A, i.e. they are 
partial FFT vectors. As such, strictly speaking, they are not 
orthogonal. Notice, however, that for i ^ i' , 

L 



k=0 



(40) 



which after straightforward manipulation can be shown to be 

L + l, (i = i') 



1 

L+l 



(41) 



sin{-K{i-i )-^) 

This is a function of [i — i') mod N . Thus, without loss of 
generality, we can set i' = 1 and plot this autocorrelation with 
respect to i. The autocorrelation decays with i as shown in 
Figure |2] We can use this observation in implementing our 
blind RLS algorithm. Specifically, note that the whole OFDM 
data is available to us and so we can visit the data subcarriers 
in any order we wish. The discussion above shows that the data 
subcarriers should be visited in the order i, i + A, i + 2A, ... 




Fig. 2. Autocorrelation vs i for A'^ = 64 and L = 15 



TABLE III 

Estimated computational cost per iteration of the RLS 

ALGORITHM WITH CARRIER REORDERING 



Term 


X 




+ 






2L -t 


- 2 


L 






1 




1 




p-/{i) 


1 






1 


Xii) 


1 




1 




hi 


L + 


2 


L+l 


1 


7(i) 


3 




1 


1 


Total per iteration 


4L + 


13 


2L + 4 


3 



where A should be chosen as large as possible to make 
o-i , a,j+A , cti+2A , • • • as orthogonal as possible, but small 
enough to avoid revisiting (or looping back to) a neighborhood 
too early. We found the choice A = to be a good 

compromise. From Figure |2] which plots ( 1411 1 for iV = 64 
and L — 15, columns 1, 5, 9, 13, 17, 21, • • • , 61 are orthogonal 
to each other and so are the columns 2, 6, 10, 14, 18, • • • , 62. 
So, if the vectors are visited in the following order 
1,5,9, 13, 17, 21, • • • , 61, 2, 6, 10, 14, 18, • • • , 62, • • • , then we 
have a consecutive set of vectors that are orthogonal. The 
only exception is in going from column 61 to 2. These two 
columns are not really orthogonal but are nearly orthogonal 
(the correlation of columns 1 and 61 is zero, so the correlation 
of 61 with 2 should be very small since the correlation 
function is continuous as shown in Figure |2]i. In general, 
we chose A — and visit the columns in the order 

2 + A,i + 2A, • • • ,i + LA,i = 1, • • • , A - 1. 

Our simulation results show that the BER we get with exact 
calculation of Pi and that obtained when we set P_i — I 
with subcarrier reordering are almost the same. Table Hill gives 
the computational complexity incurred in the RLS calculation 
when subcarrier reordering is used (i.e., free from Pi calcu- 
lation). 

Note that with subcarrier reordering, the new version of the 
RLS runs without the need to use the power delay profile 
statistics, which relieves us from the need to provide this 
information. 
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V. Computational Complexity in the High SNR 
Regime 

In the section, we study the other source of complexity 
(backtracking) and show that there is almost no backtrackingf] 
in the high SNR regime. To this end, consider the behavior of 
the algorithm when processing the ith subcarrier There are 
different alphabet possibilities to choose from at this subcarrier 
and a similar number of possibilities at the preceding i — 1 
subcarriers, creating a total of |rip — 1 incorrect sequences 
X/^i) and one correct sequence X(iy The best case scenario 
is to have only one sequence that satisfies Mp^^,^ < r in which 
case there would be only one node to visit. The worst case is 
having to visit the remaining — 1 wrong nodes before 
reaching the true sequence (visiting of nodes will happen 
through backtracking); this latter case is equivalent to the 
exhaustive search scenario (i.e., all possible sequences satisfy 
-^x^Q ^ '')■ Thus, if we let Ci denote the expected number 
of nodes visited at the ith subcarrier, then from above we can 
write 



(42) 



where P.^ is the maximum probability that an erroneous 
sequence of symbols A'(j) ^ Xf^.^^ has a cost less than r. 
We will show that this probability becomes negligibly small 
at high SNR values. Recall that 



yit) = Vp diag(Af + ATj,) 



(43) 



where ■Af(i) denotes the first i symbols of A/". Note the (l43T l 
can be written as 



y^,-) = 7pdiag(A'(,))A[^) I 



h 



(44) 



We first prove our claim for the least squares (LS) cost and 
then show how the MAP cost reduces to LS cost for high 
SNR. 



A. LS cost 

Suppose we have an erroneous sequence of symbols Af (j) ^ 
X(i). The LS estimate of h is found by minimizing the 
objective function 



{li:>^W-Vpdiag(Af(,))A[^)/i||2} (45) 
Chapter 12, pp. 664) 



The cost associated with the LS solution is given by (see f38l. 
Chapter 11, pp. 663) 



Mx^^= [I-Vp diag(A'(,))AH A(,)diag(A'(,))H 
V^diag(Ar(i))A[^) ^A(,)diag( at") 
= yf,^ (l-p diag(Af(,))AH (pA(,)|diag(Ar(,))|MH ) 
A(^,)dmg{X^,))jy^,) 

= yUi--pD)y^^) 

yf^^{i-D)y^,^ (47) 



Mx 
where 



D = diag(Af(,))A[^) (^A(,)|diag(Af(,))|^A[l) j A(,)diag(Ar(;)). 

(48) 

So the probability that the sequence A'(j) satisfies M^^^,^ < r 



reads 



= Pr(M;e^^, < r) 

= pr (:y^[^) (/-£>) < 



(49) 



In the strict sense of the word, backtracking means visiting 
Step 3 in our algorithm. Substituting ( |44| ) in ( |49] l yields 



Pi = Pr 



h 



G 



where 



7p A(,)diag(Af(i)) 
I 



\I~D 



h 

Afi^y 



< r 



(50) 



^diag(Ar(,))A[J) / 



Let B = diag(Af then G(i) can be written as 



p [I D]B B^ [I D] I 
I[I D]B I[I D]I 



which in compact form can be expressed as 



pE E2 



(51) 



(52) 



(53) 



Jls = min 
and the solution of h is (see 

h = [A(,;)diag(irJ^))diag(A:(,))A[^)]-i7^ A(,)diag(Ar" 

(46) 

'The term "backtracking" refers to the case when the algorithm is cun'ently 
at subcarrier i and it has to change the estimate of the data symbol at some 
subcarrier j < i. On the other hand, sweeping the constellation points at 
subcanier to find the first one that satisfies Mx^^^ < r is not considered 
backtracking. 



Using the Chernoff bound the right hand side of (l50b can be 
bounded in the following way 



P, < e^'''E 
'Noting that 

with 



exp —p 



h 



h 



AT, 



G 



h 



(i). 



'AA(0,Sm) 



Rh 
I, 



(54) 



(55) 



(56) 



g 



we can solve the expression in (|54] i as 

H 



exp — /i 



h 



AT, 



(0 



G 



h 



AT, 



(0 



-/^r_(L+i+l) 



exp 



/expj 



H 



(S(,)+mG(,)) 



- fir ~{L+i+l) 



exp 



dhdAf, 



{i)> 



Note that the numerator in i5% is a muhi-variate complex 
Gaussian integral. Recall that an n-dimensional complex 
Gaussian integral has the solution (see [|19I ) 



j exp(-||x| 



W 



det(VF) ■ 



This allows us to simplify i5% as 



(58) 



(59) 



Next, we show that the probability Pi — > as p — > oo. To 
show this, we just need to show that the largest eigenvalue of 
the term in the denominator goes to infinity as p — > oo. 

Lemma 1: Let E = A(i)diag(A'(j))[J-D]diag(Af(i))A(^) 
be a (L + 1) X (L+ 1) matrix, then for any sequence -^(i), E 
has a positive maximum eigenvalue, Amax and a corresponding 
unit-norm eigenvector v of size (i + 1) x 1. 
Proof: Recall that 



D = diag(Ar(,))A^) (A(,)diag(Af"))diag(Ar(,))A 



A(,)diag(Ar(,)) 



(60) 



and let F — diag(Af(i))A(j), then we can write the above 
equation as 



D = F [f^F^ ^ F^ = FF^ 



(61) 



where F 



t _ 



F is the Moore-Penrose pseudo- 

invers43 (see f4Tl, Chapter 5, pp. 422). Therefore, D is an 
idempotent matrix with eigenvalues equal to either or 1 ll40l 
and hence, [I — D] is also a positive semi-definite idempotent 
matrix. Note also that the matrix E in ( l53T l can be written as 

E = A(,)diag(ArJ^))[/-r>]diag(Af(,))A^) 

= B^[I D]B (62) 

and 

z^Et. = z"b"[J - D]Bz = {Bzf[I - D]{Bz) > (63) 
and so E is Hermitian and positive semi-definite. 

*the columns of F are linearly independent. 



Let = [ui U2 ••• Ui+i] be a + X + unitary 
matrix where is the ith eigenvector, then, E = UAU^ 
where A is a diagonal matrix containing ordered eigenvalues 
of E such that Ai > A2 > • • • > A^+i. Let z = C/"v, then 
the maximum eigenvalue of E is given as 



max v^E-v = max z^Az 

l|v||2 = l ||z||2 = l 

L + 1 



= max 

11^112 = 



(64) 
(65) 



L + l 

< max Ai V (66) 

ll-l|2 = l 

< Ai = Aniax (67) 

The equality is attained when v is the eigenvector of Amax- 

■ 

Lemma 2: Given that E has a positive maximum eigen- 
value Amax with corresponding unit-norm vector v of size 
(i + 1) X 1, then the maximum eigenvalue of G(i) in (|52] | is 
lower bounded by w^G(i)W — p Amax where 



V(L+l)xl 



(68) 



Proof: From Lemma 1, the largest eigenvalue of E is 
Amax- It follows that the largest eigenvalue of pE is pAmax- 
Let AJj^jj^ be the largest eigenvalue of G(i). From i53[ . we 
can see that pE is a principal sub-matrix of G(i) (see |41|, 
Chapter 7, pp. 494) and thus 



(69) 



i.e., the largest eigenvalue of the principal sub-matrix pE is 
smaller than or equal to the largest eigenvalue of G(i) (see 
f4l I, Chapter 7, pp. 551-552). Thus pAmax is a lower bound 
on the largest eigenvalue of G(i). ■ 
Note that Hi is positive definite as it is a covariance matrix, 
hence it will have positive eigenvalues. From Lemma 2, the 
maximum eigenvalue of G(i), Amax — > 00 as p — )• 00. Thus 
the denominator in f5% grows to infinity in the limit p 00 
and 



lim 



From ( |42] | and ( |70] i. we have 



lim < 
lim < 

p—¥OQ 



1) lim 

p— >-oo 



(70) 



(71) 
(72) 



B. MAP cost 

The cost associated with the MAP solution of an erroneous 
sequence of symbols X^i-^ 7^ Afj^) is given as (see ll38ll . 
Chapter 11, pp. 672) 



Mx,,^ = yl) {l+p diag(A'(,))A^)/?^A(,)diag(Ar;:))j y^e, 

(73) 
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Mathematically, 

= Pr(M;e^^, < r) 



TABLE IV 

Total computational cost of the ML blind and training based 

ALGORITHMS AT HIGH SNR 



Pr(^3^") [I + P diag(Af(,))^^)/?;,^(,)diag(A'(,); 

By matrix inversion lemma 

diag(Ar(,))A[J)/?hA(,)diag(A'' 



< r 



(74) 



(0. 



= /-pdiag(Af(,;))A 



(0 



p A(,)diag(Af" )diag(Ar(,))A[^) 



A(,)diag(Ar^!5) 



= /-diag(A'(,))AH 



A(,)diag(Ar" )diag(A'(i))A^) A(,)diag(-*[^i)) 



(76) 



where 



D = 

A(,)diag(Ar(;)) 
Thus ( |74l ) can be written as 



P 



h 'T'-4(i)diag(Ar(,))diag(Af(,))A( 



P,=Pr(yf^)(/-r>)3^M <r 



(77) 



(78) 



note that (fTSl l is of the same form as (|49l ). The only difference 
in the LS and MAP costs is the presence of the term ^ 
in ( |77] l. Also note that this term depends on the inverse of 
the SNR. For low SNR, the inverse term in ( |77] i is always 
invertible due to the regularization term. At high SNR, the 
effect of regularization fades and inverse term in (iTTl i is 
invertible. At high SNR, i.e., p — > oo, i R^^ ^ and D 
of ( l76b takes the same form as that of LS cost leading to ( l72b . 

Table |IV] lists the estimated computational cost for our 
blind algorithm in the high SNR regime. Since there is no 
backtracking, the total number of iterations is N, which 
explains our calculations in Table |IV] It thus follows that the 
total number of operations needed for our algorithm is of the 
order 0{LN) in high SNR regime. The pilot based approach 
for channel estimation needs to invert an (L + 1) x (L + 1) 
matrix (assuming we need L + 1 pilots to estimate a channel 
of length L + 1) with a complexity of the order O(L^). Since 
the cyclic prefix is a fixed fraction of the OFDM symbol 
{L = N/m with m typically set to m = 4 or 8) we see that 
the complexity of the two approaches become comparable in 
the high SNR regime. 



VI. Simulation Results 

We consider an OFDM system with N = 16, or 64 
subcaiTiers and a CP of length i = -j. The uncoded data 
symbols are modulated using BPSK, 4-QAM, or 16-QAM. 
The constructed OFDM signal then passes through a channel 



Algorithm 


X 


+ 


Blind Algorithm 


(3L2 + IIL + 17)7V 


(2L2 + 5L + A)N 


Blind algorithm 
with 

canier reordering 


(4L + 13)7V 


(2L + 4)7V 


Training based 
algorithm 1391 


4L2 + 17L + 13 


2L2 + 6L + 4 



of length L + 1, which is assumed to be block fading (i.e., 
constant over one OFDM symbol but fades independently from 
one symbol to another) and whose taps follow an exponential 
decay profile (E[\h{t)\^] = e~°-2*). 

A. Bench marking 

We compare the performance of our algorithm against the 
following receivers 

1) the subspace-basecj^ blind receiver of ifTOl . 

2) the sphere decoding based receiver of |'28l, 

3) a receiver that acquires the channel through training with 
L + 1 pilots and a priori channel correlation Rh ||39]| . 

4) the ML receiver that acquires data through exhaustive 
search. 

The simulations are averaged over 500 Monte-Carlo runs. 

Figure [3] compares the BER performance of our algorithm 
with the aforementioned algorithms for an OFDM system 
with TV = 16 subcarriers and BPSK data symbols. Note 
in particular that our blind algorithm outperforms both the 
subspace and sphere decoding algorithms and almost matches 
the performance of the exhaustive search algorithm for low and 
high SNR, which confirms the ML nature of the algorithm. 

Figure m which considers the 4-QAM case, shows the same 
trends observed for the BPSK case of Figure [3] 

Figure |5] considers a more realistic OFDM symbol length 
{N — 64), drawn from a 4-QAM constellation and allows 
the SNR to grow to 45 dB. Our blind algorithm shows no 
eiTor floor signs, which is characteristic of non-ML methods. 
Furthermore, the algorithm beats the training-based method 
and follows closely the performance of the perfect channel 
case. Figure |6] shows the results with = 64 subcarriers and 
16-QAM data symbols for SNR as large as 50 dB. Again, the 
proposed blind algorithm does not reach an error floor. 

B. Low-Complexity Variations 

In this subsection, we investigate the low-complexity vari- 
ants of our algorithm. Specifically, we consider the perfor- 
mance of the blind algorithm with 

1) Pi set to J, 

2) Pi set to I with subcarrier reordering 

'The block fading assumption is maintained for all simulations. However, 
for the subspace blind receiver of |10| to work, the channel needs to stay 
constant over a sequence of OFDM symbols. For this particular receiver, the 
channel was kept fixed over 50 OFDM symbols. 
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SNR(dB) 



Fig. 3. BER vs SNR for BPSK OFDM over a Rayleigh channel with 

TV = 16 and L = 3 




io'°i • • 1 

5 10 15 20 25 30 35 40 

SNR(dB) 



Fig. 4. BER vs SNR for 4-QAM OFDM over a Rayleigh channel 
with TV = 16 and L = 3 



Figure |7] exhibits the comparisons for the various algorithms 
for BPSK and TV = 16. Note that with P, set to I arbitrarily, 
the performance of the blind algorithm deteriorates and the 
BER reaches an error floor Contrast this with the algorithm 
variant that uses subcarrier reordering as well, and note that the 
performance of this variant follows closely the performance of 
tiie exact blind algorithm. Also note that the BER of both of 
these algorithms beats that of the sphere decoding algorithm 
of ll28i . The same trends are observed in Figure [H] which 
considers the 4-QAM case. 

Figure |9] compares the average runtime of various algo- 
rithms as a function of the SNR. Note first that the extreme 
cases are the training-based receiver and the exhaustive search 
receiver, both of which are independent of the SNR. The 
runtime of the proposed algorithm decreases with the SNR and 
is sandwiched in-between the run time of the sphere decoding 
algorithm and that of the subspace algorithm for all values of 




10 15 20 25 30 35 40 45 

SNR{dB) 



Fig. 5. BER vs SNR for 4-QAM OFDM over a Rayleigh channel 
with TV = 64 and L = 15 




SNR(dB) 

Fig. 6. BER vs SNR for 16-QAM OFDM over a Rayleigh channel 
with TV = 64 and L = 15 

the SNrH Note that in the high SNR regime our algorithm 
runs at the same speed as the subspace algorithm. 

Figure [TO] shows the average runtime of the proposed 
algorithm with TV = 16 for various modulation schemes 
(BPSK, 4-QAM and 16-QAM). It is clear from the figure 
that the average runtime decreases considerably at higher SNR 
values. 

VII. Conclusion 

In this paper, we have proposed a low-complexity blind 
algorithm that is able to deal with channels that change on a 
symbol by symbol basis allowing it to deal with fast block fad- 
ing channels. The algorithm works for general constellations 
and is able to recover the data from output observations only. 

*The runtime of the subspace algorithm is adjusted to account for the fact 
that it requires the channel to be constant over a block of L + 1 OFDM 
syinbols. 
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SNR{dB) 



Fig. 7. Comparison of low-complexity algorithms for BPSK OFDM 
with iV = 16 and L = 3 




SNR{dB) 

Fig. 8. Comparison of low-complexity algorithms for 4-QAM OFDM 
with TV = 16 and L = 3 



Simulation results demonstrate the favorable performance of 
the algorithm for general constellations and show that its 
performance matches the performance of the exhaustive search 
for small values of A^. 

We have also proposed an approximate blind equalization 
method (avoiding Pi with subcarrier reordering) to reduce 
the computational complexity. As evident from the simulation 
results, this approximate method performs quite close to the 
exact blind algorithm and can work properly without a priori 
knowledge of the channel statistics. Finally, we study the 
complexity of our blind algorithm and show that it becomes 
especially low in the high SNR regime. 
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Abstract 

This paper proposes a low-complexity algorithm for blind equalization of data in OFDM-based wireless systems 
with general constellation. The proposed algorithm is able to recover data even when the channel changes on a 
symbol-by-symbol basis, making it suitable for fast fading channels. The proposed algorithm does not require any 
statistical information of the channel and thus does not suffer from latency normally associated with blind methods. 
We also demonstrate how to reduce the complexity of the algorithm, which becomes especially low at high SNR. 
Specifically, we show that in the high SNR regime, the number of operations is of the order 0{LN), where L is the 
cyclic prefix length and N is the total number of subcarriers. Simulation results confirm the favorable performance 
of our algorithm. 

Index Terms 

OFDM, channel estimation, maximum-likelihood detection, maximum a posteriori detection and recursive least 
squares. 
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I. Introduction 

Modem wireless communication systems are expected to meet an ever increasing demand for high data rates. A 
major hindrance for such high data rate systems is multipath fading. Orthogonal frequency division multiplexing 
(OFDM), owing to its robustness to multipath fading, has been incorporated in many existing standards (e.g., IEEE 
802.11, IEEE 802.16, DAB, DVB, HyperLAN, ADSL etc.) and is also a candidate for future wireless standards 
(e.g., IEEE 802.20). All current standards use pilot symbols to obtain channel state information (needed to perform 
coherent data detection). This reduces the bandwidth available for data transmission, e.g., the IEEE 802. lln standard 
uses 4 subcarriers for pilots, that is 7.1% of the available bandwidth, of the 56 subcarriers available for transmission. 
Blind equalization methods are advantageous as they do not require regular training/pilots symbols, thus freeing up 
valuable bandwidth. 

Several works exist in literature on blind channel estimation and equalization. A brief classification of these 
works based on a few commonly used constraints/assumptions is given in Table U (note that this list is not 
exhaustive). Broadly speaking, the literature on blind channel estimation can be classified into maximum-likelihood 
(ML) methods and non-ML methods. 

The non-ML methods include approaches based on subspace techniques ifTl- lfTOl . second-order statistics ifTTI . 
ifTll . |fT3l . Cholesky factorization llT4l . iterative methods ifHl . virtual carriers |16| real signal characteristics ifTTl 
and linear precoding |[T2l . ifTSl . Subspace-based methods Hl-llll, f/l- lfTOl generally have lower complexity but 
suffer from slow convergence as they require many OFDM symbols to get an accurate estimate of the channel 
autocorrelation matrix. Blind methods based on second-order statistics IfTTI . |[T2l . |fT3l also require the channel to 
be strictly stationary over several OFDM blocks. More often than not, this condition is not fulfilled in wireless 
scenarios (e.g., as in WLAN and fixed wireless applications). Methods based on Cholesky's factorization |[T4l and 
iterative techniques ifTSll suffer from high computational complexity. 

Several ML-based blind methods have been proposed in literature (201, Ull, ||2Tl-||35l, ||37l. Although they 
incur a higher computational cost, their superior performance and faster convergence is very attractive. These 
characteristics make this class of algorithms suitable for block fading scenarios with short channel coherence time. 
Usually, suboptimal approximations are used to reduce the computational complexity of ML-based methods. Though 
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TABLE I 

Literature Classification 





T imi t"f*H r\\7 

LjiiiiiLcu. uy 


i>HJL 111111 LCU u y 


Channel constant over 
ivl symDois, M > i 


CI, m, nil, 151, 161, 0, 13, M, 

m, flu, HI, mi, nisi, na, 

LL/Ji, LLol, IIAUi, yjj, ||24||, gz/i, UM 


m, ETi 


Uses pilots 
to resolve 
phase ambiguity 


m. El, a, mni, mi, 
m, nsi, ma, m, m\, 
m, m, f25i, pel, [371 




Constant modulus constellation 


[21, [31, [61, [91, [121, [131, 

m, mi, ii2Qi, m, 
isa, iia, im, usa, wi 


m, a, in, uni, m 
El, m. Hill 



these methods reduce the complexity of the exhaustive ML search, they still incur a significantly high computational 
cost. Some methods like |[2TI . Il23l . Il24l are sensitive to initialization parameters, while others work only for specific 
constellations (see Table lU). A few ML-based algorithms allow the channel to change on a symbol-by-symbol basis 
(e.g., Il26l . Wit ), however, these algorithms are only able to deal with constant modulus constellations. 

To the best of our knowledge no bUnd algorithm in literature is able to deal with channels that change from one 
OFDM symbol to another when the data symbols are drawn from a general constellation. Contrast this with the 
equalization algorithm presented in this paper. The key features of the blind equalization algorithm presented in 
this paper are that it 



1) works with an arbitrary constellation, 

2) can deal with channels that change for one symbol to the next, 

3) does not assume any statistical information about the channel. 



In addition, we propose a low-complexity implementation of the algorithm by utilizing the special structure of 
partial FFT matrices and prove that the complexity becomes especially low in the high SNR regime. 

The paper is organized as follows. Section Ull describes the system model and Section |lll] describes the blind 
equaUzation algorithm. Section |IV] presents an approximate method to reduce the computational complexity of the 
algorithm, while Section |V] evaluates this complexity in the high SNR regime. Section |Vl] presents the simulation 
results and Section IVIII gives the concluding remarks. 
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A. Notation 

We denote scalars with small-case letters, x, vectors with small-case boldface letters, x, while the individual 
entries of a vector h are denoted by h{l). Upper case boldface letters, X, represent matrices while calligraphic 
notation, X, is reserved for vectors in the frequency domain. A hat over a variable indicates an estimate of the 
variable, e.g., /i is an estimate of h. (.)^ and (.)^ denote the transpose and Hermitian operations, while the notation 
stands for element-by-element multiplication. The discrete Fourier transform (DFT) matrix is denoted by Q and 
defined as qi^k = e~^^^''~^^^'^~^^ with k,l = 1,2, - ■ ■ ,N (N is the number of subcarriers in the OFDM symbol), 
while the invrse DFT (IDFT) is denoted as Q^. The notation ||a||g represents the weighted norm defined as 
II a II 3 = a^Ba for some vector a and matrix B. 



II. System Model 



Consider an OFDM system where all the N available subcarriers are modulated by data symbols chosen from 
an arbitrary constellation. The frequency-domain OFDM symbol X, of size N x 1, undergoes an IDFT operation 
to produce the time-domain symbol x, i.e. 

X = VnQ^X. (1) 

The transmitter then appends a length L cyclic prefix (CP) to x and transmits it over the channel. The channel 
h, of maximum length L + 1 < N, is assumed to be constant for the duration of a single OFDM symbol, but 
could change from one symbol to the next. The received signal is a convolution of the transmitted signal with the 
channel observed in additive white circularly symmetric Gaussian noise n ~ M{0, 1). The CP converts the linear 
convolution relationship to circular convolution, which, in the frequency domain, reduces to an element-by-element 
operation. Discarding the CP, the frequency-domain received symbol is given by 

y = ^/pnQX+^r, (2) 



where p is the signal to noise ratio (SNR) and y, H, X ,J\f , are the -point DFT's of y, h, x, and additive noise 
n respectively, i.e. 



U = Q 



h 




1 



1 



1 



X = —=Qx, J\f = —=Qn, and 3^ = —=Qy 



N 



N 



(3) 



Note that h is zero padded before taking its A'^-point DFT. Let consist of first L + 1 columns of Q (i.e., A 
consist of first L + 1 rows of Q^), then 



n = A^h and h = AH. 



(4) 



This allows us to rewrite (O as 



y = ^d.i&g{X)A^h + J^. 



(5) 



III. Blind Equalization Approach 



Consider the input/output equation ([S]), which in its element by element form reads 

y{j) = ^pX{3)a}^h+M{3) (6) 

where aj is the jth column of A. The problem of joint ML channel estimation and data detection for OFDM 
channels can be cast as the following minimization problem 



Jml = min 11;^ - diag(A')A"/i"^ 

N 



min y |3^(i)- VP'^'Waf^P 
1=1 

(i N 
J2 \y{j) - v~p xij)afh\' + J2 \yu) - Vp xij)afhf \ (v 



j=l j=i+l 



where denotes the set of a 
A'(j) up to the time index i, i.e 



1 possible A^— dimensional signal vectors. Let us consider a partial data sequence 



and define Mx^^^ as the corresponding cost function, i.e. 

M;f„ = min ||3^(,) - ^ diag(A'(,))Ag)/if, (8) 

where consists of the first i rows of A^. 

In the following, we pursue an idea for blind equalization of single-input multiple-output systems first inspired 
by lfT9l . Let R be the optimal value for the objective function ^ (we show how to determine R in Section UlI-BI 
further ahead). If Mx^^-, > R, then A'(j) can not be the first i symbols of the ML solution to To prove 

this, let and denote the ML estimates and suppose that our estimate A'(j) satisfies 

^(i) = (i) (9) 
i.e. the estimate A'(j) matches the first i elements of the ML estimate. Then, we can write 

Hi,||2 



R = min \\y - ^ diag(X)A"-h 

h,xen'^ II V 



TV 



= - diag(A'5^)AH + \yij) - X^\j)afh^y 

j=i+i 

N 

= ||3^«-^/^diag(A'(,))A«/x'''f + ^ \yij)-^pX^Hj)afh'''^\\ (10) 

j=i+i 

where the last equation follows from Now, clearly 

\\y^i)-^dmg{X^i))Af,^h^'^f > rnin||:V(i)- Vpdiag(A'(,))Af^)/if (11) 

= ||3^(i)- Vpdiag(A'(,))AH^f, (12) 



'Thus, for example X^2) = [X{l),X{2)f and X^n) = [-^^(1), ■ • ■ , X{N)f = X. 



where h is the argument that minimizes the RHS of ([TTI ). Then 

TV 

j=i+l 

> rnin \\y^i^ - ^ diag(A'(i))A{^)h,f 

= M;,,,,. (13) 

So, for A'(j) to correspond to the first i symbols of the ML solution , we should have ^X(.^ < Note that 
the above represents a necessary condition only. Thus if Af(j) is such that ^ < R, then this does not necessarily 
mean that A'(j) coincides with X^-^. 

This suggests the following method for blind equalization. At each subcarrier frequency i, make a guess of 
the new value of and use that along with previous estimated values A'(l), (i — 1) to construct 



Estimate h so as to minimize M^, in (113] ) and calculate the resulting minimum value of M-i, . If the value of 

— ° ^(i) 

M^^^ < R, then proceed to i + 1. Otherwise, backtrack in some manner and change the guess of for some 
j < i. A problem with this approach is that for i < L + 1, given any choice of X{i), h can always be chosen by 
least-squares to make M-^^^ in ([T3] ) equal to zercQ Then, we will need at least L + 1 pilots defying the blind nature 
of our algorithm. Alternatively, our search tree should be at least L + 1 deep before we can obtain a nontrivial (i.e. 
nonzero) value for M-o . 

An alternative strategy would be to find h using weighted regularized least squares. Specifically, instead of 
minimizing the objective function J ml in equation (O, we minimize the maximum a posteriori (MAP) objective 
function 

Jmap= min {\\h\\l-^ + \\y -^dmg{X)A^hf\ (14) 
where Rh is the autocorrelation matrix of h (in Section ITVl we modify the blind algorithm to avoid the need for 



^Since A^^j is full rank for i < L + 1, diag(Af(i))A(^) is full rank too for each choice of diag(Af(i)) and so we will always find some 
h that will make the objective function in l ll3t zero (since h has L+1 degrees of freedom). 



channel statistics). Now the objective function in ([141 ) can be decomposed as 



Jmap = mill < 



N 



j=i j=i+i 



Given an estimate of the cost function reads 



^^•(,.1, = + 113^ - ^diag(Af(i_i))Ag_i)/if } 



mmse= [7?;^^ +pA(i_i)diag(i'(i_i))"diag(A'(i_i))Ag_ ' ^ 



Hi-i)J 



(15) 



(16) 

with the optimum value (see II38I . Chapter 12, pp. 671) 

h = y/p /?/jA(j„i)diag(A'(j„i))[/ + p diag(A'(j_ i))A«_,)/?,A(,_i)diag(A'(,_i))]-i3^(,_i) (17) 
and corresponding minimum cost (MMSE error) 



(18) 



If we have a guess of X{i), we can update the cost function and obtain ^x^.^ In fact, the cost function Af^ is 
the same as that of ^ with the additional observation y{i) and an additional regressor X{i)af, i.e. 

2 . 









y{i-i) 




diag(A'(,_i))AH 


M-{, = mill < 

'^(o h 



















h 



(19) 



We can thus, recursively update the value M^x^-^ based on M^^ using recursive least squares (RLS) 11381 . i.e. 



(20) 



hi = hi^i + Qi [y[i) - ^ X{i)afhi^i 



(21) 
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where 

9i = ./pi{i)X{ifP,.iai (22) 

lii) = (23) 

l + p\X{i)\^afP,^^a, 

Pi = P,_i-/5 7(i)|i(i)|2p,_ia,afPi_i (24) 

These recursions apply for all i and are initialized by 

M^^_^^ = 0, P_i = Rh, and h_i = 

Now, let R be the optimal value for the regularized objective function in (fT4l ). If the value R can be estimated, we 
can restrict the search of the blind MAP solution X to the offsprings of those partial sequences A'(j) that satisfy 
My, < R. This forms the basis for our exact blind algorithm described below. 

A. Exact Blind Algorithm 

In this subsection, we describe the algorithm used to find the MAP solution of the system. The algorithm employs 



the above set of iterations (EOt — (124]) to update the value of the cost function M-j^^ ^ which is then compared with 
the optimal value R. The input parameters for the algorithm are: the received channel output y, the initial search 
radius r, the modulation constellatioiu ^ and the IxA^ index vector /. 

The algorithm is described as follows (the algorithm is also described by the flowchart in Figure [l]) 

1) (Initialize) Set i = 1, = 1 and set X{i) = 

2) (Compare with bound) Compute and store the metric M-j^^^. If Af^^^ > r, go to 3; else, go to 4; 

3) (Backtrack) Find the largest 1 <j <i such that 

< If there exists such j, set i = j and go to 5; else go to 6. 

4) (Increment subcarrier) If i < set i = i + = 1, = and go to 2; else store current 
A'(^), update r = M.^^^^ and go to 3. 

^Examples of the modulation constellation are Q, are 4-QAM and 16-QAM. We use \Q\ to denote the constellation size and Q{k) for the 
fcth constellation point. For example, in 4-QAM \Q\ =4 and f2(l), • • ■ , f2(4) are the four constellation points of 4-QAM. The indicator 
refers to the last constellation point visited by our search algorithm at the ith subcarrier. 
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5) (Increment constellation) Set = + 1 and X{i) = Go to 2. 

6) (End/Restart) If a full-length sequence ^(^n) has been found in Step 4, output it as the MAP solution and 
terminate; otherwise, double r and go to 1. 

The essence of the algorithm is to eliminate any choice of the input that increments the objective function beyond 
the radius r. When such a case is confronted, the algorithm backtracks (Step 3 then Step 5) to the nearest subcarrier 
whose alphabet has not been exhausted (the nearest subcarrier will be the current subcarrier if its alphabet set is 
not exhausted). 

The other dimension the algorithm works on is properly sizing r; if r is too small such that we are not able to 
backtrack, the algorithm doubles r (Step 3 then Step 6). If on the other hand r is too large that we reach the last 
subcarrier too fast, the algorithm reduces r to the most recent value of the objective function, (r = Mx^^^^) and 
backtracks (Step 4 then Step 3). 




Fig. 1. Flowchart of the blind algorithm. 
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Remark 1: The backtracking algorithm depends heavily on calculating the cost function using (|20l)-(l24l). In the 
constant modulus case, the values of ^[^^(i)^ in equations ( [23] ) and (l24l) become constant (equal to p Ex) for all 
i, and the values of ^{i) and Pi become 

= ^^ r ^Hp (25) 

Pi = P^^l- p£xl{i)P^-la^afPi^l, (26) 

which are independent of the transmitted signal and thus can be calculated offline. 

Remark 2: The algorithm can also be used for a pilot-based standard. In this case, when the algorithm reaches a 
pilot holding-subcarrier, no backtracking is performed as the value of the data carrier is known perfectly. In the 
presence of pilots, it is wise to execute the algorithms over the pilot-holding subcarriers first and subsequently move 
to the data subcarriers. For equispaced comb-type pilots, (semi)-orthogonality of regressors is still guaranteed. 
Remark 3: Like all blind algorithms, we use one pilot bit to resolve the sign ambiguity (see references in Table ID. 



B. Determination of initial radius p, and r 

Our algorithm depends on p, and r which we need to determine. The receiver can easily estimate p by 
measuring the additive noise variance at its side. As for the channel covariance matrix R^, our simulations show 
that with carrier reordering we can replace R^ with identity with essentially no effect on performance. This becomes 
especially true in the high SNR regime. It remains to obtain an initial guess of the search radius r. To this emd, 
note that if h and X are perfectly known (with h drawn from J\f{0, Rh) but is known) then 

^ = \\h\\l-^ + \\y-VP diag{X) A^hf (27) 

is a chi-square random variable with k = 2{N + L + 1) degrees of freedorto Thus, the search radius should be 
chosen such that > r) < e, where > r) = 1 — F{r; k), and where F(r; k) is the cumulative distribution 



''The first term on the right hand side has 2(L + 1) degrees of freedom as h is Gaussian distributed wfiile the second term has 2N degrees 
of freedom as ^ — diag(Af) A^/i is just Gaussian noise. 
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function of the chi-square random variable given by 



Fir- k) = (28) 
^ '^^ T{k/2) ' ^ 



Here, r/2) is the lower incomplete gamma function defined as 



l-r/2 

7(A:/2, r/2) = / t^'"^'^ e'^ dt. (29) 





So, under this initial radius, we guarantee finding the MAP solution with probability at least 1 — e. In case a solution 
is not found, the algorithm doubles the value of r and starts over. This process continues until a solution is found. 
For example, when = 64, L = 15 and e = 0.01, the value of our radius should be set to 204. 

IV. An Approximate Blind Equalization Method 

There are two main sources that contribute to the complexity of the exact blind algorithm of Section |llll 

1) Calculating Pj.- the second step of the blind algorithm requires updating the metric M^^^ . This metric 
depends heavily on operations involving the (L + 1) x (L + 1) matrix Pi which are the most computationally 
expansive (see Table HI] which estimates the computational complexity of the RLS). 

2) Backtracking: When the condition M-^^ ^ < r is not satisfied, we need to backtrack and pursue another branch 
of the search tree. This represents a major source of complexity. 

In the following, we show how we can avoid calculating Pi all together. We postpone the issue of backtracking to 
Section jV] 

A. Avoiding Pi 

Note that in the RLS recursions (|20l)— (l24l). Pj always appears multiplied by a^. Let's see how this changes if we 
set P_i = I and assume that the a^'s are orthogonal or, in particular, if we assume that apaj+i = = 0. 

With these assumptions note that 

7(0) = . — = ^ (30) 

l + p |A'(0)PaHp_iao 1 + p |A'(0)|2(L + 1) 
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i.e., 7(0) is independent of Also note that 

Poai = P-iai- pj{0)\X{0)\^P_iaoa^P-iai 
= ai-/>7(0)|^(0)|2aoa^ai 

= ai. (31) 

For a similar reason 

Poa2 = a2. (32) 



From (I31I ). it is also easy to conclude that 

7(1) = ^ (33) 

1 + ^1-^(1)^(^ + 1) 

i.e., 7(1) is independent of Pq. Also, from ( [3T| ) and (|32l ) it follows that Pjaj+i = aj+i and Piai^2 = We 
now investigate what happens to Pj+i. 

Pi+iai+2 = Piai+2-P7(« + l)l'^'(^ + l)l^^iai+ia!+i-Piai+2 
= ai+2 - pj{i + + l)pai+ia^iai+2 

= ai+2- (34) 

Similarly, 

Pi+ifli+a = (35) 



So, by induction we see that each occurrence of PjOj in the recursion set (I20l)-(l23l) can be replaced with a^. This 
allows us to discard (l24l ). i.e., 

M^^^^ = M^^^_^^+ji{}\y{i)-^Xii)afh,_,\^ (36) 



hi = h,.^+gJy{i)-^Xii)afh,^, (37) 
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TABLE II 

Estimated computational cost per iteration of the RLS algorithm 



Term 


X 


+ 




/oX(i)a^h^ 1 


2L + 2 








1 
1 


1 

1 




P 


1 

i_ 




1 




1 
i 


1 

1 




I, 

hi 


r 1 


r 1 1 
iv + I 


1 
I 




r 2 1 r 1 1 






9i 


L + 3 








L + 1 


L 




7(i) 


3 


1 


1 


«fj'.-i 


L2 + 2L + 1 


L2 + L 




Pi 


L2 + 2L + 2 


L2 + 2L + 1 




Total per iteration 


+ IIL + 17 


2L^ + 5L + 4 


3 



where 



9i = ./p7ii)Xii)''a, (38) 

7(i) = ^ . (39) 

l + p\X{i)\^L + l) 

Thus, the approximate bhnd RLS algorithm is effectively running at LMS complexity. Table |ll] summarizes the 
computational complexity incurred in the RLS calculation. 



B. Avoiding Pi with Carrier Reordering 

The reduction in complexity above is based on two assumptions. The first assumption is to set P_i = / (instead 
of Rh) and the second is to assume that the consecutive a^'s are orthogonal. Note that the aj's are columns of 
A, i.e. they are partial FFT vectors. As such, strictly speaking, they are not orthogonal. Notice, however, that for 



U"-^(i-i')k) 



(40) 



A;=0 



which after straightforward manipulation can be shown to be 



\afai 



1 

L+l 



L + 

sin{Tr{i-i')j-) 



{Z = i') 
{i / Z') 



(41) 
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This is a function of (i — i') mod N . Thus, without loss of generality, we can set i' = 1 and plot this autocorrelation 
with respect to i. The autocorrelation decays with i as shown in Figure |2] We can use this observation in 




i 



Fig. 2. Autocorrelation vs i for = 64 and L — 15 

implementing our blind RLS algorithm. Specifically, note that the whole OFDM data is available to us and so we can 
visit the data subcarriers in any order we wish. The discussion above shows that the data subcarriers should be visited 
in the order i, i + A, i + 2 A, . . . where A should be chosen as large as possible to make a,, Oj+A, ttj+2A! • • • 
as orthogonal as possible, but small enough to avoid revisiting (or looping back to) a neighborhood too early. We 
found the choice A = to be a good compromise. From Figure [2l which plots (|4TI) for = 64 and L = 15, 
columns 1, 5, 9, 13, 17, 21, • • • ,61 are orthogonal to each other and so are the columns 2, 6, 10, 14, 18, • • • , 62. So, 
if the vectors are visited in the following order 1, 5, 9, 13, 17, 21, • • • , 61, 2, 6, 10, 14, 18, • • • , 62, • • • , then we have 
a consecutive set of vectors that are orthogonal. The only exception is in going from column 61 to 2. These two 
columns are not really orthogonal but are nearly orthogonal (the correlation of columns 1 and 61 is zero, so the 
correlation of 61 with 2 should be very small since the correlation function is continuous as shown in Figure H). 
In general, we chose A = and visit the columns in the order i + A,i + 2A, • • • ,i + LA, i = 1, - ■ ■ , A — 1. 

Our simulation results show that the BER we get with exact calculation of Pj and that obtained when we set 
P i = / with subcarrier reordering are almost the same. Table Hill gives the computational complexity incurred in 
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TABLE III 

Estimated computational cost per iteration of the RLS algorithm with Carrier Reordering 



Term 


X 


+ 






2L + 2 


L 




\yii)-^X{z)afh,^,\^ 


1 


1 




p-f{i) 


1 




1 




1 


1 




hi 


L + 2 


L + 1 


1 




3 


1 


1 


Total per iteration 


4L + 13 


2L + 4 


3 



the RLS calculation when subcarrier reordering is used (i.e., free from Pj calculation). 

Note that with subcarrier reordering, the new version of the RLS runs without the need to use the power delay 
profile statistics, which relieves us from the need to provide this information. 



V. Computational Complexity in the High SNR Regime 

In the section, we study the other source of complexity (backtracking) and show that there is almost no 
backtrackings in the high SNR regime. To this end, consider the behavior of the algorithm when processing the 
ith subcarrier. There are \Q\ different alphabet possibilities to choose from at this subcarrier and a similar number 
of possibilities at the preceding i — I subcarriers, creating a total of — 1 incorrect sequences A'(j) and one 
correct sequence ^(i)- The best case scenario is to have only one sequence that satisfies M^p^.^ < r in which case 
there would be only one node to visit. The worst case is having to visit the remaining — 1 wrong nodes before 
reaching the true sequence (visiting of nodes will happen through backtracking); this latter case is equivalent to the 
exhaustive search scenario (i.e., all possible sequences satisfy Mp^^,^ < r). Thus, if we let Cj denote the expected 
number of nodes visited at the ith subcarrier, then from above we can write 

a<i + - i)Pi (42) 



'The term "backtracking" refers to the case when the algorithm is currently at subcarrier i and it has to change the estimate of the 
data symbol at some subcarrier j < i. On the other hand, sweeping the constellation points at subcarrier to find the first one that satisfies 
^Xf^i-i < r is not considered backtracking. 
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where Pi is the maximum probabiUty that an erroneous sequence of symbols A'(j) 7^ A'(j) has a cost less than r. 
We will show that this probability becomes negligibly small at high SNR values. Recall that 



y^i) = Vpdiag(A'(,))Ag)/i +Ar(, 



i) 



(43) 



where J^(i) denotes the first i symbols of Af. Note the (l43l) can be written as 



Vpdiag(Af(i))AH / 



h 



(44) 



We first prove our claim for the least squares (LS) cost and then show how the MAP cost reduces to LS cost for 
high SNR. 



A. LS cost 

Suppose we have an erroneous sequence of symbols A'(j) 7^ '^(i)- The LS estimate of h is found by minimizing 
the objective function 

Jls = ^min^ - diag(Ar(i))A« /if} (45) 

and the solution of h is (see ll38l . Chapter 12, pp. 664) 

h = [A(i)diag(A'|^))diag(i'(i))Ag)]-i^ A(i)diag(A'J^)):V(i). (46) 

The cost associated with the LS solution is given by (see 1381 . Chapter 11, pp. 663) 

^^(.) = Vpdiag(A'(,))A« (VpA(,)diag(A'(,))«Vpdiag(-*(,))AH)"'vpA,)diag(A'J]))):y;(,) 

= (l-p diag(A'(,))AH (/, A(,)|diag(Af(i))|2Ag))~' A^^d\s.g{X%))y 

Mx,, = yl){l-D)y^^ (47) 



where 



D = diag(A'(,))AH (A(,)|diag(A'(,))|2AH j A(,)diag(A'^)). 
So the probabiUty that the sequence A'(j) satisfies Mp^^,^ < r reads 



= Pr(M^^^, < r) 



In the strict sense of the word, backtracking means visiting Step 3 in our algorithm. Substituting (|44] 



//r 



Pi = Pr 



VV 



h 



G 



h 



\ \ 

< r 



/ 



where 



G 



(0 



^ A(i)diag(A'(j)) 



\I-D] 



^diag(i'(i))Aj^) / 



Let B = diag(Ar(j))A|^-j, then can be written as 



pB^[I - D]B B^ [I -D]I 
III- D]B I\I -D]I 



which in compact form can be expressed as 



G 



pE E2 



^2 -^3 



Using the Chemoff bound the right hand side of (l50l ) can be bounded in the following way 







/ 


h 


H 


h 


\ 




Pi < e'^^'E 


exp 






Git) 












v 




















/ 
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Noting that 



with 



h 



'AA(0,S(,)) 



(55) 



Rh 
I, 



(56) 



we can solve the expression in (154] ) as 



exp 





h 


H 


h 


\ 
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h 


H 


h 


\ 












exp 












v 












v 
















/ 








) 



dhdAf 



fir ~r-{L+i+l) 



e f^' vr 



exp 



/ 


h 


H 


h 


\ 














dhdAf (j) 


V 








/ 













exp 



h 

AT, 



(i) 



g— /iryp(L+i+l) 

\ 



dhdAf, 



(i) 



/jr7]-(L+i+l) 



(57) 



Note that the numerator in (157] ) is a multi-variate complex Gaussian integral. Recall that an n-dimensional complex 
Gaussian integral has the solution (see |[T9l ) 



exp I - ||x||^ 



dx 



■K" 



det(VF) 



(58) 



This allows us to simplify (ISTt as 



det(S(j) +/iG(i)) 



(59) 



Next, we show that the probability — )• as p — )• oo. To show this, we just need to show that the largest eigenvalue 
of the term in the denominator goes to infinity as p — )^ cxd. 



Lemma 1: Let E = A(j)diag(A'(j))[/ — Z)]diag(A'(j))A^^ be a (L + 1) x + matrix, then for any sequence 
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Pt'(j), E has a positive maximum eigenvalue, A^ax and a corresponding unit-norm eigenvector v of size (L + 1) x 1. 
Proof: Recall that 

D = diag(A'(,))Ag) (A(,)diag(A'[l))diag(A'(,))AH A(,)diag(A'^)) (60) 

and let F = diag(A'(j))A|^^, then we can write the above equation as 

D = F (F^F) ~^ F^ = FF'f (61) 

where F^ = (^F^F) ^ F^ is the Moore-Penrose pseudo-inverse^ (see BTI . Chapter 5, pp. 422). Therefore, D is 
an idempotent matrix with eigenvalues equal to either or 1 BOl and hence, [/ — D] is also a positive semi-definite 
idempotent matrix. Note also that the matrix E in (1531 ) can be written as 



E = A(i)diag(Af|;))[J-r>]diag(Ar(i))Ag) 



B^[I-D]B (62) 



and 

z^Ez = z^B^[I - D]Bz = (-Bz)"[/ - D]{Bz) > (63) 
and so E is Hermitian and positive semi-definite. 

Let U = [ui U2 • • • ul_|_i] be a (L + 1) X (L + 1) unitary matrix where Uj is the ith eigenvector then, 
E = UAU^ where A is a diagonal matrix containing ordered eigenvalues of E such that Ai > A2 > • • • > A^+i. 



^the columns of F are linearly independent. 
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Let z = [/^v, then the maximum eigenvalue of E is given as 



max v^£^v = max z^Az (64) 

v||2 = l ||z||2 = l 



L+1 



max 

|z||2 = 



c y,X^\zi\^ (65) 

1=1 
L+1 

< max Ai > IzjP (66) 

|z||2 = l ^ 
1 = 1 

< Ai = Amax (67) 



The equaUty is attained when v is the eigenvector of An 



Lemma 2: Given that E has a positive maximum eigenvalue A^ax with corresponding unit-norm vector v of 
size (L + 1) X 1, then the maximum eigenvalue of in (|52l ) is lower bounded by w^G(j)W = p Amax where 



w 



V(L+l)xl 
Oixl 



(68) 



Proof: From Lemma 1, the largest eigenvalue of E is Amax- It follows that the largest eigenvalue of pE is 
pAmax- Let A'max be the largest eigenvalue of From (l53l) . we can see that pE is a principal sub-matrix of 
(see |41], Chapter 7, pp. 494) and thus 

(69) 

i.e., the largest eigenvalue of the principal sub-matrix pE is smaller than or equal to the largest eigenvalue of 
(see fin . Chapter 7, pp. 551-552). Thus pAmax is a lower bound on the largest eigenvalue of ■ 



Note that is positive definite as it is a covariance matrix, hence it will have positive eigenvalues. From Lemma 
2, the maximum eigenvalue of Amax — oo as p — oo. Thus the denominator in ( [59l ) grows to infinity in the 

limit p — )• oo and 

lim (70) 

p— i>00 
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From (1421 ) and (1701 ). we have 



lim Ci < 1 + - 1) lim Pi (71) 
lim Ci < 1 (72) 

p— >CJO 



S. MAP cost 



The cost associated with the MAP solution of an erroneous sequence of symbols A'(j) ^ A'(j) is given as (see 
21, Chapter 11, pp. 672) 



(73) 



Mathematically, 



= PrU^||J/ + pdiag(A'(,))A||)/?;,A(i)diag(Ar^))j y^i^< 



(74) 



By matrix inversion lemma 



+ y/p diag(A'(i))Ag)/?,,A(i)diag(A'(i)) 



-1 



I-p diag(A'(,))A[^) + p A(,)diag(A'^))diag(A'(,))A[|)J A(,)diag(A'^)) (75) 
/-diag(A'(i))A}^) i 7?^^ + A(i)diag(ArJ]))diag(A'(,))A}^) A(i)diag(Ar|])) 



7- i:) 



(76) 



where 



£> = diag(A'(i))A 



^ 7?^^ + A(i)diag(A'J]))diag(Ar(i))Ag) 



-1 



A(i)diag(A'(i)) 



(77) 



Thus (1741) can be written as 



(78) 
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TABLE IV 

Total computational cost of the ML blind and training based algorithms at high SNR 



Algorithm 


X 


+ 


Blind Algorithm 


(3L^ + 11L + 17)iV 


(2L^ + 5L + 4)A^ 


Blind Alg. with Carrier Reordering 


(4L + 13)A^ 


{2L + A)N 


Training Based Algorithm L3_9J 


4L^ + 17L + 13 


2L^ + 6L + 4 



note that (1781 ) is of the same form as (l49l ). The only difference in the LS and MAP costs is the presence of the 
term ^ in (ITT] ). Also note that this term depends on the inverse of the SNR. For low SNR, the inverse term in 
dTTJ is always invertible due to the regularization term. At high SNR, the effect of regularization fades and inverse 
term in (177] ) is invertible. At high SNR, i.e., p — >■ oo, ^ RJ^^ —?• and D of (1761 ) takes the same form as that of 
LS cost leading to (1721) . 

Table |IV] lists the estimated computational cost for our blind algorithm in the high SNR regime. Since there is 
no backtracking, the total number of iterations is N, which explains our calculations in Table JV] It thus follows 
that the total number of operations needed for our algorithm is of the order 0{LN) in high SNR regime. The pilot 
based approach for channel estimation needs to invert an (L + 1) x (L + 1) matrix (assuming we need L + 1 pilots 
to estimate a channel of length L + 1) with a complexity of the order 0{L^). Since the cyclic prefix is a fixed 
fraction of the OFDM symbol (L = N/m with m typically set to m = 4 or 8) we see that the complexity of the 
two approaches become comparable in the high SNR regime. 

VI. Simulation Results 

We consider an OFDM system with = 16, or 64 subcarriers and a CP of length L = ^. The uncoded data 
symbols are modulated using BPSK, 4-QAM, or 16-QAM. The constructed OFDM signal then passes through a 
channel of length L + 1, which is assumed to be block fading (i.e., constant over one OFDM symbol but fades 
independently from one symbol to another) and whose taps follow an exponential decay profile = e~'^'^*). 

A. Bench marking 

We compare the performance of our algorithm against the following receivers 
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1) the subspace-basecy blind receiver of ifTOl . 

2) the sphere decoding based receiver of f2E\, 

3) a receiver that acquires the channel through training with L + 1 pilots and a priori channel correlation 



4) the ML receiver that acquires data through exhaustive search. 
The simulations are averaged over 500 Monte-Carlo runs. 

Figure [3] compares the BER performance of our algorithm with the aforementioned algorithms for an OFDM 
system with = 16 subcarriers and BPSK data symbols. Note in particular that our blind algorithm outperforms 
both the subspace and sphere decoding algorithms and almost matches the performance of the exhaustive search 
algorithm for low and high SNR, which confirms the ML nature of the algorithm. 

Figure IH which considers the 4-QAM case, shows the same trends observed for the BPSK case of Figure [3] 




10 15 

SNR(dB) 



Fig. 3. BER vs SNR for BPSK OFDM over a Rayleigh channel with = 16 and L = 3 



Figure [5] considers a more realistic OFDM symbol length (N = 64), drawn from a 4-QAM constellation and 
allows the SNR to grow to 45 dB. Our blind algorithm shows no error floor signs, which is characteristic of non-ML 

^The block fading assumption is maintained for all simulations. However, for the subspace blind receiver of |10| to work, the channel 
needs to stay constant over a sequence of OFDM symbols. For this particular receiver, the channel was kept fixed over 50 OFDM symbols. 
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Fig. 4. BER vs SNR for 4-QAJVI OFDIVI over a Rayleigh channel with iV = 16 and L = 3 

methods. Fuithermore, the algorithm beats the training-based method and follows closely the performance of the 
perfect channel case. Figure [6] shows the results with = 64 subcarriers and 16-QAM data symbols for SNR as 
large as 50 dB. Again, the proposed blind algorithm does not reach an error floor. 
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-H — Channel estimated using L+1 pilots and corr. 
© - Proposed Blind Equalization Algorithm 
— Perfectly known channel 
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Fig. 5. BER vs SNR for 4-QAM OFDM over a Rayleigh channel with iV = 64 and L = 15 
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SNR(dB) 

Fig. 6. BER vs SNR for 16-QAM OFDM over a Rayleigh channel with iV = 64 and L = 15 

B. Low-Complexity Variations 

In this subsection, we investigate the low-complexity variants of our algorithm. Specifically, we consider the 
performance of the blind algorithm with 

1) Pi set to /, 

2) Pi set to / with subcarrier reordering 

Figure |7] exhibits the comparisons for the various algorithms for BPSK and N = 16. Note that with Pi set to / 
arbitrarily, the performance of the blind algorithm deteriorates and the BER reaches an error floor. Contrast this 
with the algorithm variant that uses subcarrier reordering as well, and note that the performance of this variant 
follows closely the performance of the exact blind algorithm. Also note that the BER of both of these algorithms 
beats that of the sphere decoding algorithm of |[28l . The same trends are observed in Figure |8j which considers the 
4-QAM case. 

Figure |9] compares the average runtime of various algorithms as a function of the SNR. Note first that the extreme 
cases are the training-based receiver and the exhaustive search receiver, both of which are independent of the SNR. 
The runtime of the proposed algorithm decreases with the SNR and is sandwiched in-between the run time of the 
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Fig. 7. Comparison of low-complexity algorithms for BPSK OFDIVI with = 16 and L = 3 

sphere decoding algorithm and that of the subspace algorithm for all values of the SNRq Note that in the high 
SNR regime our algorithm runs at the same speed as the subspace algorithm. 

Figure [10] shows the average runtime of the proposed algorithm with = 16 for various modulation schemes 
(BPSK, 4-QAlVI and 16-QAlVI). It is clear from the figure that the average runtime decreases considerably at higher 
SNR values. 



VII. Conclusion 

In this paper, we have proposed a low-complexity blind algorithm that is able to deal with channels that change 
on a symbol by symbol basis allowing it to deal with fast block fading channels. The algorithm works for general 
constellations and is able to recover the data from output observations only. Simulation results demonstrate the 
favorable performance of the algorithm for general constellations and show that its performance matches the 
performance of the exhaustive search for small values of N. 

We have also proposed an approximate blind equalization method (avoiding Pj with subcarrier reordering) to 
reduce the computational complexity. As evident from the simulation results, this approximate method performs 



The runtime of the subspace algorithm is adjusted to account for the fact that it requires the channel to be constant over a block of L + 1 
OFDM symbols. 
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Fig. 8. Comparison of low-complexity algorithms for 4-QAM OFDM with 16 and L = 3 
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Fig. 9. Average time comparison for BPSK data symbols with A^ = 16 and L = 3 



quite close to the exact blind algorithm and can work properly without a priori knowledge of the channel statistics. 
Finally, we study the complexity of our blind algorithm and show that it becomes especially low in the high SNR 
regime. 
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Fig. 10. Average Time Comparison for our Blind Algorithm for Different Modulation with iV = 16 and L = 3 
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